Introducing the RowGen Test File Wizard
A test data generator is an important part of the setup process for DevOps and data architects prototyping database and data warehouse operations, testing applications, benchmarking different platforms, and outsourcing work formats. These users should not rely on production data in those processes because that risks exposing personally identifiable information (PII), and/or it may not reflect the characteristics of future production data.
IRI RowGen allows these architects to synthesize new, and realistic test data targets in flat file formats, and relational databases … in a single operation that supports referential integrity, applies custom formats, and, is safe. The populated test data reflects the kind of production values that users would encounter in normal database or ETL operations.
RowGen users can leverage purpose-built job wizards in the IRI Workbench GUI (built on Eclipse™) to simplify and streamline test data generation. One of the most useful wizards is the New Test Data Job wizard, and is the subject of this article. If you’re interested in generating structurally and referentially correct test data for SQL database targets, see this article.
The file-centric test data wizard constructs individual RowGen jobs with one or more custom data layouts, to produce one or more targets all in the same pass (while sorting on key fields). Users can define their own field/column layouts piece-by-piece, or they can incorporate existing metadata that already describes all or most of the test target data structures.
RowGen and its wizards support several ways to make data more realistic, including the random selection of values from set files (such as from an existing database column of values in a text file), or by using any of the available mini-wizards for creating fit-for-purpose set files on the fly.
Set files can consist of any combination of literal values and/or ranges and ensure that only unique values in a set file are selected from top to bottom, and support valid pairs in the test data.
File & Field Format Specification
In the RowGen test data generation wizard, the vehicle for specifying the file format can be defined in the source (and/or target) specification dialog, where a menu of options is offered; e.g.,
The most basic form for values that the test data generator synthesizes consists of randomly generated characters that conform to the declared data type. Alternatively of course, are random selection from set files. Either way, this can be defined field-by-field, in the input / generation phase of a job; e.g,
Here the user can see basic information about the custom layout, re-position or clear fields, and optionally declare a field separator and frame character. When an existing field is double-clicked, the Test Field Data Attributes dialog opens (see below) where the user can specify attributes for that field, including: name, position within the record, data type, min- and max-size, upper- vs. lower-case letters, left vs. right alignment, etc.
The dialog offers additional options for enhancing field-level realism such as value endianness, and the ability to specify a percentage of null values to be generated among the full set of values.
And from within the Test Field Data Attributes dialog, users can click ‘define’ to select an existing, or define a new, set file if they do not want random generation of values based solely on the declared data type.
Additional Attributes
In the New Test Data Job wizard, there are several ways for the test data generator to add data realism to a field, including the following set file creation mini-wizards:
- Range or Literal Values – Allows users to create their own literal and range entries, and save the custom list of entries to a set file. These can take the form of ASCII, NUMERIC, DATE or TIMESTAMP values.
- Compound Data Values – Allows users to create their own composite values to be saved to a set file. It is a ‘builder’ that allows you to choose from all available data types, existing set files, and other field attribute options to create a customized list of entries in the format that conforms to your business rules, e.g.: part numbers:
1-5000-AQ5|123-45-6789 B3-4956-ZV3|432-75-60893 C9-4020-BG8|543-23-6644
- Set from Column – Allows users to create a set file by selecting one or more columns from a database table/view that they select. Selection of multiple columns for inclusion in the set file is useful for valid pairs (threes, fours, etc.)
By default, all set files created with the above mini-wizards are invoked for the field currently being defined, in addition to being saved for future use, or by other applications in the test data generator.
A special field option called ROWID is also available if the user wants to provide a row number within each record, or some other steadily incrementing or decrementing value such as a customer ID number that could later be used as a key value during the optional sort phase. You can also select from the provided weighted distribution options, or apply a custom algorithm to any field to produce a range of values that best reflects the spread that would be encountered with real data.
Sorting
By default, the test data generator creates unsorted data. However, in RowGen, you can opt to continue to the sort phase while using the New Test Data Job wizard, and declare one or more test data fields as sort keys as demonstrated below:
Sorting of course using the IRI CoSort engine for speed in volume. These multi-threaded sorts of the test files can be used to bypass the sort engine within, and thus improve the performance of, bulk database load utilities like SQL*Loader and bcp.
The Target Field Layout Editor
By default, a single target file is produced to contain the test data, and there is no need to make specifications in the OUTFILE section of a RowGen job script because the layout the users specify in the INFILE section is copied to the OUTFILE section by default.
However, you can also add one or more file targets, make changes in the OUTFILE section to one or more each target’s fields — such as applying protections, performing cross-calculations, using IF THEN else logic, etc., — as shown here:
Functionality featured in the IRI CoSort and IRI FieldShield products is available if the user wishes to modify the OUTFILE section of a RowGen job. Additional targets, each designed for a specific recipient (based on business rules) with different layouts, or with certain field values, can be generated.
These multiple targets (based on one or more bespoke layouts) are produced in the same pass during test data generation. This and other functionality (including multiple set file creation dialogs), are available.
For a specific step-by-step example of how to use this wizard to customize a synthetic CSV file, see this article. For more information generally, see www.iri.com/products/rowgen or email rowgen@iri.com with any questions.