Quick Links
Challenges
String-level expression logic is used to search, manipulate, and report on data according to certain patterns or rules. This functionality is often associated with text editors, SQL and shell commands, but is rarely integrated into high volume data processing operations.
In other words, string parsing, pattern matching, and other low level data manipulations must usually occur in a separate tools and I/O steps, increasing coding and processing overhead.
Some tools for ETL, data quality, and reporting are also missing the kind of substring functionality needed in to accomodate special use cases, like date value manipulation or sensitive character replacement.
Solutions
The SortCL program in IRI CoSort and IRI Voracity supports Perl Compatible Regular Expression (PCRE) logic for pattern matching, as well as find and replace functions, extracting substrings, and other string and substring manipulation tasks. SortCL also supports field padding and alignment, character validation, and field re-mapping.
These functions are also useful in the context of data discovery, master data management, and data quality improvement.
More importantly, substring extraction and other intricate string manipulations can occur in the same job script and I/O pass with all the other simultaneous functions SortCL performs, like:
- Data Transformation (sort, join, aggregate, filter, remap, etc.)
- Data Migration (of data types and file formats)
- Data Protection (field level encryption, de-ID, masking, etc.)
- Reporting (batch, delta, detail and summary BI)
Please use the form below to tell us about your use cases for substring operations or other complex data manipulations.