Challenges
Data matching between large tables for query, reporting, or virtualization purposes can take a long time. SQL join operations are typically inefficient in large scale data integration (unification) operations. Custom programs designed to bring unlinked items together may also be slow, or difficult to maintain.
You may also need a fast and easy way to compare two or more files over one or more fields. How do you do that and identify the changes that have occurred (inserts, updates, deletes) between two files, especially when data are in different file formats, or across tables in different databases?
Solutions
The Sort Control Language (SortCL) program in the IRI CoSort data transformation package and IRI Voracity data management (ETL) platform can simultaneously filter, sort, join, aggregate, and reformat multiple table and file sources at once.
SortCL uses simple, explicit 4GL text files to define data sources, targets, and transformations. Automatic script creation, cross-platform execution, modification, and management are supported in the free Eclipse GUI, IRI Workbench.
SortCL supports inner and outer data joining functionality to produce combined outputs and file compares based on specified conditions. With SortCL you can:
- Input, join, and output one or more pre-sorted or unsorted tables and/or files
- Eliminate inner join results from an outer join
- Eliminate and reformat null records
In the same, simple job script and I/O pass with data joining, you can also:
- Cross-calculate and derive new values from matched result
- Add field-level data masking functions to sensitive fields (PII)
- Custom-define multiple detail and summary report targets
- Hand-off selected information in different formats for data visualization tools.
The bottom line? The big data matching techniques supported in SortCL join operations allow you to bring matching data from disparate sources together, exclude non-matches, compare files and table data externally, capture changed data, produce business intelligence from it, and reduce database query and refresh overhead.
Did you know that IRI CoSort was the first data management product to join data in flat files? IRI introduced join operations in CoSort in 1999.