Addressing Value-Level Issues

 

Next Steps
Overview Aggregate Cross-Calculate Custom Transforms Formats/Keys Lookup Match/Join Pivot/Unpivot Data Remapping Scrub/Cleanse Select/Filter Sort/Merge Substrings Type-Convert

Challenges


There are many lower-level, data-value-related problems that ETL architects face, such as:

Issue Examples or Details
Applying masks to data formats to represent or re-interpret values

ID'inggender, 'F' or '2' or masking dates: mm/dd/yy to
dd-mm-yyyy

Homogenizing inconsistent production data keys
assigning unique numeric surrogates for recognition and faster indexing
Pivoting and unpivoting
transposing columns to rows and vice versa without duplicates
Inserting constants or literal strings
replacing or adding substitute values to protect or pre/re-label data
Setting or labeling values
evaluating a condition to determine a default label like 'NULL'
Validating values
determining if the inbound field is printable, or of a certain data type
Re-casting values to conform to reference standards
unifying different data representations (like country names)
Slowly changing dimensions in source tables
propagating new values (like a price change) in warehouse tables

Solutions


The IRI Voracity ETL/ data management platform and the IRI CoSort tool powering Voracity address all the above data handling issues, plus:

  • key and other data transformations
  • data type and data format conversion
  • basic and advanced report generation
  • PII masking and custom test data synthesis

through their common 4GL data manipulation program called SortCL, which is also supported in their Eclipse GUI called IRI Workbench.

With SortCL, you can:

  • Remap dates and other data formats using new composite values, and re-usable data types
  • Re-assign source key names to adhere to new ontologies
  • Pivot (denormalize) for efficient queries, and unpivot(normalize) before loading data to the warehouse
  • Use in-line 'data' statements to add strings to any location, and with custom repetition counts
  • Use conditional data or field logic to label qualifying values, like inserts vs. deletes vs. updates, etc.
  • Use supplied data validation functions or third-party libraries to assess data characteristics or quality
  • Use integrated Perl Compatible Regular Expression (PCRE) logic to match and re-assert data patterns
  • Handle and report on multiple types of slowly changing dimensions and other changed data

IRI data manipulation tools like CoSortand RowGen (for synthetic test data generation) create unique initial or surrogate primary keys with features like:

  1. SEQUENCER, a specially named index field with custom starting and increment values
  2. random data generation, to create random numbers or strings, which can also be sorted and de-duplicated
  3. ROWID, a field with any name but specified as this attribute with features beyond SEQUENCER
  4. UUID (and GUID), a field-level transformation/generation function that can be used in any part of a job
Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.