Challenges
Evaluating the performance of hardware platforms and software applications requires the use of realistic production data. Files and tables must be the right sizes, and contain the right data types, file formats, record layouts and counts, plus value (data frequency) distributions.
Standard benchmarks promulgated by organizations like the Transaction Processing Performance Council (TPC) may similarly require a wide range of pre-determined volumes and layouts of test data.
Large files and tables can take a long time to build and load without the right tools and techniques. Culling sample data from production can take time, and violate data privacy rules.
Test data tools like TDG or Snowfakery for Salesforce may also be difficult to use or require specialized programming skills (like Java or YAML). More elaborate test data management hubs that generate synthetic test data are too expensive, and not designed to accommodate the customization and speed in volume that many system benchmarks require.
Solutions
The IRI RowGen test data tool -- or the IRI Voracity data management platform which includes RowGen-- can synthesize safe, big test data files -- in CSV, JSON, XML, LDIF, ASN.1, COBOL and many other structured (even report) formats -- and insert or bulk load intelligent data into relational and NoSQL database platforms.
With RowGen, you can generate a complete and consistent battery of files and tables to stress-test different software and hardware platforms. Uniquely embedded data transformation functionality can also help you perform a test data quality assessment or evaluate the best processing paradigms for your environment. Read on for more details ...
RowGen can create any number (and size) of files or relational tables with any amount of columns in any fixed or delimited position, with more than 100 different data types available. It can also automatically generate and load test data for multiple targets in multiple formats simultaneously.
With RowGen you can even filter or select and transform the synthetic record (row) and field (column data) data to emulate production data and simulate how downstream transformation logic will affect that data. You can also decide to preserve or change the generated values in successive runs through random seed management.
Where benchmarking of database prototypes, Data Vault architectures, or data warehouse ETL operations are required, RowGen will respect the layout and relationships of production tables from existing DDL. It will build a batch script you can run to rapidly create and populate test DB targets that are structurally and referentially correct.
Each value within your test datasets can contain either randomly generated data, or data randomly selected from set files or numeric ranges to be as realistic as necessary.