What Can SortCL Do?
The Sort Control Language (SortCL) program in the IRI CoSort product or IRI Voracity platform accepts multiple inputs, including:
- sequential (delimited or fixed-position), COBOL index, and semi-structured (flat JSON/XML) files
- pipes
- relational (and some NoSQL) database tables (collections) via ODBC
- URLs for static and streaming sources, including S3/GCP/AzureBlob, HTTP/S, FTP/S, HDFS, MongoDB, Kafka, and MQTT
- user procedures
in multiple formats, processes them in many ways, and produces one or more targets in multiple formats -- as well as customized reports -- all at once. See the table below and this diagram in the context of CoSort, or the data integration, migration, governance, and analytic portions of this diagram in the broader context of Voracity.
Specifically, SortCL can, in one job script and I/O pass, rapidly perform and combine data transformation, conversion, protection, reporting, and related processes:
Function |
Actions |
---|---|
At the byte, field, and record level, plus duplicate removal and saving |
|
Conditional (include/omit) selection with if-then-else, else-if logic |
|
Multiple keys, directions, sequences |
|
Two or more pre-sorted files |
|
Two or more un/sorted sources on many conditions for ETL, file compares and change data capture (delta reporting) ops |
|
Parallel roll-up and drill-down sum, min, max, average, and count values; accumulate (running); rank; lead and lag (sliding value windows) |
|
Check |
Verify source data is pre-sorted prior to sort or join operations |
Resize, reposition, and realign fields |
|
Change data types (e.g., EBCDIC<>ASCII, Packed<>Numeric) |
|
Convert between file formats (e.g., Text <>XML<>VS<>RS<>ISAM<>Vision<>LDIF<>CSV<>JSON) |
|
De-normalize and normalize dimensional layouts |
|
De-duplicate, validate, homogenize, filter, find/replace, and re-structure |
|
Integrate and segment data enhance row and column detail; create new data forms and layouts through conversions, calculations and expressions, and composite (templates) |
|
via remapping and replication of columns and tables |
|
Math and trig functions across detail and summary rows, plus internal and external stats functions |
|
Bit-level manipulations and Perl-compatible regular expression logic for pattern matching, etc. |
|
Check that character and field attributes match their specifications (i.e., "iscompares", gap analysis) |
|
Sequence |
For custom indexing, reporting, and database load operations, plus UUID/GUID value insertion |
Discrete field substitutions, pseudonymization, etc., using "set" file field dimensions |
|
Fuzzy Lookup |
For slowly changing dimension (SCD) reporting and data quality |
Get discrete (lookup) values and virtualize results in reports and replicas |
|
Mask (Protect) |
Encrypt and mask data at the field level and audit data security measures; also anonymization, de-identification, filtering, and pseudonymization |
Mask (Format) |
Numeric, date and string layout masking to create or replace new value formats |
Lookup |
Discrete or random draws from set files for use in ETL lookup transforms, pseudonymization, and test data generation |
Create randomly-generated or set-selected (safe) test data files (see RowGen) |
|
Custom-formatted, segmented detail, and summary targets |
|
Copy, manipulate, and move data from one or more sources to one or more targets |
|
Complex field-level user functions (e.g., 3rd-party DQ libraries) |
Beyond data staging, manipulation, and migration, use SortCL to report on changed data (inserts, updates, deletes), slowly changing dimensions, and trend line intersection.
Additional SortCL features support: metadata and master data management, clickstream analytics (data webhousing), real-time and near-real-time processing, customer data integration and segmentation, data wrangling (data preparation for BI and analytics), and multiple data governance objectives.