Select / Filter

 

Next Steps
Overview Aggregate Cross-Calculate Custom Transforms Formats/Keys Lookup Match/Join Pivot/Unpivot Data Remapping Scrub/Cleanse Select/Filter Sort/Merge Substrings Type-Convert

Challenges


You need to perform bulk data filtering or custom data selection to reduce data volume or perform selective data extraction for a variety of reasons, including:

  • data archival, migration, replication and federation (data mesh)
  • report generation, business intelligence, and analytics; e.g., customer segmentation
  • database subsetting (and masking) to provide realistic but safe test schema
  • answer Data Subject Access Requests (DSARs) to comply with GDPR or similar privacy laws
  • data deduplication, data reduction, data forensics, and data quality initiatives

SQL select statements are great for "normal" database table sizes but, in volume (and depending on key relationships), can take a long time. They also can also hamper bulk unload and data subsetting jobs.

Data selection techniques outside the RDB realm also vary among different data sources and silos, requiring different data filtering commands. This can be particularly onerous in heterogenous data integration contexts, and require complex ETL or coding work.

Solutions


The SortCL program in IRI CoSort (and IRI Voracity) offers several horizontal (record/row-level) and vertical (field/column-level) selection capabilities for any specified set of structured data sources on-premise or in the cloud. CoSort/SortCL spin-off products -- including IRI FieldShield (for data masking), IRI NextForm (for data migration and replication), and IRI RowGen (for test data generation) -- all have the same row and column selection (filtering) capabilities.

Manage the size, number, and flow of records before, during, and after data transformation and reporting. Eliminate, reduce, or hand-off of records for fast, SQL-oriented data filtering during input and output. Bulk data reduction increases efficiency for SortCL transformations, reports, loads, and other downstream processes.

Use direct /QUERY syntax in SQL in the input phastof a SortCLjob, or native SortCLsyntax for condition logic (e.g., if-then-else expressions) to include, omit, and reformat records in various phases of your script for data transformation, reporting, and/or protection. Filter records by data class or column name, or record-specific conditions. For example, you can specify which value ranges are valid and output only records inside or outside of those ranges.

Remove duplicate records. Validate character forms and perform other data integrity checks to reject or isolate bad records before loading them into a database.

Base these specifications on your business logic, and save them in text scripts managed in Eclipse, and shared in Git, for easy modification and re-use.

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.