Big data transformations, conversions, protections, and reports -- all in one place, product, and pass.
Architectural Challenges |
IRI Solutions |
---|---|
Structured big data integration, cleansing, and conversion tasks take too long, or rely on complex transform designs in:
|
IRI Voracity, the total data management platform built on CoSort, Hadoop, and Eclipse. The metadata definition language and default processing engine for data manipulation in Voracity and all IRI software is the Sort Control Language (SortCL) program first designed for IRI CoSort users. SortCL uses an open, fourth-generation language (4GL) to define, combine, and optimize transformations and mappings in the same job script and I/O pass. You can also use SortCL for data cleansing (data quality) and PII data masking, as well as reporting (and limited analytics) OR data preparation tool for your BI platform. Build SortCL jobs automatically in the free Eclipse GUI (IRI Workbench), and graphically modify and visually stage them in the ETL palette. Schedule and run jobs in or outside the GUI, and team-share them in repositories like EGit. Voracity big data users can also seamlessly drive SortCL transformation jobs in Hadoop MapReduce 2, Spark, Spark Stream, Storm and Tez. Legacy ETL tool users can convert their jobs to Voracity via erwin Smart Connector technology. IRI's standalone NextForm product is a SortCL spin-off, purpose-built for converting: data types, endianness, record layouts, file formats, and databases. IRI Workbench GUI, built on Eclipse™ also supports Voracity ETL and standalone SortCL jobs, plus NextForm data migration, FieldShield data masking, and RowGen test data operations. |
Changed data capture (CDC) and other reports run in separate steps or products. In high volume, they are too slow. |
Use the multi-source bulk/batch change data capture wizard in Voracity and schedule its jobs in near real-time with a high-resolution setting in the task scheduler for incremental data replication. You can also build SortCL jobs that transform and remap data within custom detail and summary reports. The reports can be used for:
SortCL also feeds BIRT reports in memory within the Eclipse GUI, IRI Workbench, or Splunk with a Voracity add-on.. |
To comply with privacy regulations, you need data security, but your development and outsourced processes need access to realistic data. Most encryption methods only protect data in one database, or far more than just the fields you need to protect. Conversely, most data masking products do too little. |
Voracity or SortCL jobs that transform, migrate, and report on data (above) can also simultaneously protect data at risk. Apply field-level encryption, canned and custom data masking, de-ID, hashing, randomization, sub-string, expression, pseudonymization, shifting and custom functions. Determine re-ID risk with peer-approved statistical functions and scores. Leverage key management, tokenization, and auditing, too. Sensitive data classification, discovery and masking tools in the IRI Data Protector Suite - IRI FieldShield, DarkShield, and IRI CellShield -- support content-aware data loss prevention, breach nullification, and data privacy law (CCPA, GDPR, HIPAA, PCI-DSS.) compliance. |
Test data needs to look real, but using production data for testing is unsafe, unavailable, or otherwise unacceptable. Other tools require sub-setting and masking, and lack big data speed and versatility. |
Voracity or SortCL jobs that transform, migrate, report on, and secure data (above) can also be used to generate safe, intelligent test data, without needing real databases or files. The same 4GL behind SortCL data definitions and manipulations is behind the IRI RowGen test data package (which is included in Voracity, and also provides test data for ETL task previews!). Switch easily between test data generation and real data processing and presentation. In the IRI Workbench (Eclipse GUI), use SortCL or RowGen to create structurally and referentially correct test data for database tables, plus custom file and report formats, with a mix of randomly selected/generated data. Other features include:
|
Master Data Management (MDM) hubs are too costly and complex. |
Use the consolidation or registry-style MDM wizard in IRI Voracity to fine and reconcile disparate values and save (or propagate) the unified master values. You can also simplify reference data and metadata, master data and master metadata through: the shared syntax of SortCL and its data definition file (DDF) format, the SortCL /TEMPLATE (composite value) function, and future graphical IRI Workbench hub for:
|
Very large database (VLDB) utility operations like unload, load, migration, query, and reorgs are too slow. |
The IRI FACT (Fast Extract) tool for Oracle, DB2, Sybase, SQL Server, MySQL, Tibero, and Altibase, unloads tables to flat files in parallel. FACT also creates SortCL DDF and loader control file metadata for optional ETL or offline reorg (unload/sort/reload) operations. CoSort pre-sorts speed loads. This facilitates the maintenance of huge fact tables in optimal query order, resulting in quicker lookup and response performance. |
A legacy sort tool is too slow, expensive, or inflexible. |
IRI CoSort has plug 'n play sort replacements, sort parm conversion routines, and/or business incentives for leaving your current sort. |