Challenges
Even after consulting and tuning are employed, big data volumes (i.e., more than one million rows) can be slow to transform, particularly without an expensive hardware or version upgrade of DataStage.
ETL performance bottlenecks include large sorts, joins, aggregations, loads, and sometimes unloads. Parallelization or optimization in other layers or tools can be unwieldy, if not expensive, and may adversely affect data delivery times for other users.
From a security standpoint, IBM's data masking solutions may be expensive or cumbersome for some, or may not provide all of the PII discovery or privacy functions for others.
Solutions
Speed DataStage Transforms
Speed sorts, joins, and aggregations in DataStage with a one-pass operation by calling the CoSort Sort Control Language (SortCL) program in a sequential file stage or Before-Job Subroutine. Run large data transformations without encumbering other jobs in DataStage, your database, or BI tool. You can also specify file-format and data-type conversions, field-level masking functions, custom reports, and pre-sorted load files.
For faster aggregations specifically, add a sequential file stage prior to the aggregation stage. In that sequential file stage, run an a SortCL job to externally pre-sort the file on break keys, and then, define the sorted fields in the aggregation stage.
Learn MoreMask PII in DataStage
Data at rest in tables and flat files within DataStage can be sensitive, containing personally identifiable information that is subject to confidentiality restrictions and data privacy laws. SortCL licensed in CoSort -- or via compatible IRI FieldShield data masking product or IRI Voracity data management (and ETL) platform operations -- can protect column/field values in any ODBC-connected database, or standalone flat-file source.
Your business rules dictate the masking function you apply to each column; i.e., format-preserving AES-256, FIPS-compliant OpenSSL, 3DES, and/or GPG encryption, lookup-value substitution (pseudonymization), character masking, hashing, redaction, custom expression logic, substring, or user field function.
Build DataStage Test Data
IRI Voracity through its constituent (or standalone) IRI RowGen software product generates safe, realistic test data using COBOL or CoSort metadata, .dsx-defined files, and any RDB data models connected through JDBC. Use RowGen to create compliant, realistic test data from random generation and/or set-file selection, and customize it even further with built-in data manipulation and formatting functionality. Voracity also includes database subsetting and masking for testing in lower environments, too.
Re-platform DataStage Jobs
Facilitate migration from DataStage to faster, more affordable ETL operations in IRI Voracity using AnalytiX DS (now erwin) Mapping Manager or Code-Automation Frameworks (CATfx). This proven technology, as well as erwin Lite Speed Conversion services, finally gives ETL architects and the CIO/CFO suite the ability to save hundreds of thousands of dollars immediately, and shift to a low-cost operational expense going forward.