What’s New in CoSort 10.5?
Abstract
Three years after the release of CoSort V10, a significant interim update to IRI’s primary data transformation software package has been announced. This article summarizes what’s new in the CoSort high-performance data sorting and ETL tool since Version 10.0.1 was introduced.
IRI introduced CoSort 10.5 on June 30, 2021 as the general release product and optional update for customers under maintenance. For those upgrading from CoSort Version 9 or below, please refer to this article on the path to Version 10 for advice.
The 10.5 release, announced here, adds support for SortCL program processing of data in Excel spreadsheets, ASN.1-compatible CDRs, and Data Vault 2.0 (DV2) database models. A series of 4 articles on SortCL support for Excel starts here, and a series of 5 articles on SortCL support for ASN.1 starts here. The first article on Data Vault is linked below.
New features in CoSort 10.5 added at that time and since, also include:
Cloud Store Data Sources & Targets
The SortCL program can address the acquisition and population of structured files it processes in Amazon S3, Azure Blog, and Google Cloud Platform (GCP) storage buckets, as well as OneDrive via its sync feature with LAN SMB folders.
For more information, see this article.
Direct Database Value Lookups
This was a frequently requested enhancement to external set file value lookup tables. Now, live values in ODBC-connected database tables can be read and used in SortCL and compatible product scripts for data transformation, reporting, pseudonymization and synthesis.
Sample /FIELD syntax for this feature is:
/FIELD=(ADJ_CLOSE, TYPE=ALPHA_DIGIT, POSITION=6, SEPARATOR=”,”, SET=”new_schema2.pricedata;DSN=New MySQL;” [DATE] DEFAULT=”170.5″ LOOKUP=”Date,Adj_Close”)
For more information, see this article.
Encrypted Credentials & Passphrases
SortCL jobs involving RDB sources and targets can obfuscate the user name and password in the ODBC connection strings in the /INFILE and /OUTFILE statements. Encrypted database login credentials can take the place of plain text username and password in job scripts. DSN file names are also supported.
In addition, field-level encryption passphrases used in FieldShield masking jobs (which are also supported in part within CoSort) can be encrypted, following the ‘epass’ argument in the /FIELD statement. Other encryption key (passphrase) management methods are described here.
IRI Workbench Updates
The latest version of IRI Workbench supporting CoSort 10 features is built on Eclipse 2020-06 with embedded 64-bit JRE 1.8. Support for JDK 11 and future releases is under development. As of 2024, JDK 17 is used.
Dashboard visualizations for data profiling and data-class based search results now supplement prior text-based logs to report on data located for profiling, quality, and masking purposes:
Full details are provided in this article.
There are also new dialogs for test data generation rules for supported RowGen synthesis or FieldShield masking users, which are enabled in CoSort and Voracity via SortCL:
Refer to this article for more details.
New Wizard for DataVault 2.0
The Data Vault Generator can convert existing relational database tables into Data Vault 2.0 architecture. Three options are available, with all creating the DDL and ER Diagram. The first option only creates the previous items.
The second option loads the tables with the generated hash keys and source data. The third option loads the tables with the generated hash keys and generated data. The generated data can be customized through the use of generation rules.
Contact voracity@iri.com for more details on the Data Vault creations supported in the wizard.
Recent Third-Party Integrations
CoSort 10.5 users in Voracity can also leverage additional third-party software tie-ups to SortCL-compatible functionality announced after 2018, including:
Analytics
The purpose-built Voracity App for Splunk can select, run, and consume the data from CoSort SortCL jobs for use in analytics and visualization.
See this article for more details.
Because CoSort SortCL job results can manifest in multiple targets and formats at once, including JSON and XML, a Datadog agent can automatically collect and parse this data automatically into what it calls logs
Once the CoSort-processed data is in Datadog, and any facets of the data have been specified based on extracted attributes, you can work with and visualize the data in that application.
Similarly, Cubeware Cockpit is another data visualization tool that can now also benefit from data transformed by CoSort in Voracity, per the comparative benchmark in this article. For more information on CoSort data wrangling and other BI/analytic tool benchmarks, check out this site.
Test Data Management
SortCL-powered, CoSort-compatible IRI FieldShield or RowGen programs for data masking, subsetting, and test data synthesis can also be called into — and their target data consumed for use in — test data portals. For DevOps in particular, these jobs have been integrated into ValueLabs Test Data Hub and Cigniti BlueSwan.
SortCL-compatible data masking and synthesis scripts can also be part of a GitLab or another third-party DevOps pipeline; see this article.
Similar job scripts (or batch jobs containing them) can also be called into database replication jobs to render them safe for testing. Beyond data masking on the fly ETL-style in Voracity of FieldShield, Windocks, Commvault, and Actifio cloning software can run IRI jobs during snapshot builds or containerization, and deliver these images on-premise or in managed cloud environments.
Note that some of the functionality described in this article, while driven by SortCL-compatible programs which will now run in CoSort 10.5, may not be covered under CoSort maintenance. You may need a Voracity license, or another CoSort spin-off product (like FieldShield or RowGen), to qualify for on-point IRI documentation and support. See this article for more information.
Please contact your IRI representative for more information about testing and upgrading to CoSort 10.5 for big data processing, or to learn more about any of the integrations above.