Generating Test Client Data
IRI RowGen software creates test data you can customize to meet specific needs. It supports the formats and techniques that make your test sets as realistic as you want them to be. In this article, I am using RowGen to generate realistic master, or dimensional, data in a flat file that can be used in other set files or database tables.
The base test set I’m creating contains client records that could represent a patient, borrower, or any account with a primary and secondary name. These test names would be inter-dependent, and would not, in reality, change often; e.g., where the second name is the co-signer on a loan the first name opened. I’m also generating dynamic numbers to represent realistic transaction values within the file.
How RowGen Works
RowGen synthesizes data according to the values and structures specified in 4GL job scripts. These scripts are either handwritten or auto generated in the GUI for RowGen (or for Voracity) called IRI Workbench, built on Eclipse™.
For any given column of data, RowGen can:
- randomly generate data according to the data type and width you specify
- randomly select data from a set file of new values and formats you define on the fly
- randomly select values from a pre-existing set of real values.
The best way to produce the most realistic test data is by using set files in the job scripts. RowGen’s forte is helping you create data sets that can be used to create bigger and better data sets. In fact, in the RowGen GUI, the New Test DB Job creation wizard uses this core capability to build entire RDB and EDW prototypes with referential integrity by creating and loading interdependent sets automatically. RowGen even allows you to choose from among several random selection methods and customize the distribution of values.
Set files consist of one or more columns of data separated by tabs. IRI provides set files for things such as names, cities, states, etc. You can also create set files from existing files or columns in a database. When only a few values are needed, such as yes and no, inline set files can be used. The inline set values are contained within the field statement of the job script.
Planning the Customer Set
Suppose you want to create test data for a customer-related application that will support referential integrity in an application or test database. The customer information is the data that remains relatively constant, and will be linked to more transient information such as dated transactions. To accomplish this, we simultaneously generate, in one record, all the information about the client that needs to remain consistent throughout any processing that might occur elsewhere. This master test set can be put in one table or file, and it can be used to build other tables or files.
The first step is to decide which pieces of information are necessary and what set files are to be used in the job script. Then create any set files that do not already exist. Next, where applicable, decide on the minimum and maximum size of each numeric field or column. For these test records, I will create:
- Account Number: The value will consist of 2 fields. The first, called Acct_Code has an A, B, C, D, or E, will be obtained from an inline set. The second field, called Acct_Digits, consists of 15 digits.
- Transaction Limit: This number represents how many thousands of dollars can be in one transaction; the maximum amount is 50. We will use a numeric set file called limit.set that contains:
[50,20] [19,10] [19,10] [9,7] 6 5 4 3 2 1 1
[1910-01-01,1949-12-31] [1950-01-01,1980-12-31] [1950-01-01,1980-12-31] [1981-01-01,1990-12-31] [1981-01-01,1990-12-31] [1991-01-01,1997-12-31]
[1910-01-01,1949-12-31] [1950-01-01,1980-12-31] [1950-01-01,1980-12-31] [1981-01-01,1990-12-31] [1981-01-01,1990-12-31] [1991-01-01,1997-12-31] [1998-01-01,2015-12-31]
- Street Address, consisting of three fields: a street number with up to 5 digits, the actual street name, and a street type, such as Road or Avenue. The STREET_NUM field values will be selected from an inline NUMERIC set in the full range defined as [1,99999]. The STREET_NAME will be randomly selected from the IRI-provided street_names.set file. STREET_TYPE will be randomly selected from the IRI–provided street_types.set file.
- City: The city should be one that actually exists in the indicated state. It will be selected from the second column of the ‘joined pairs’ set file state_2char_city.set.
- State: The state will have two characters selected from the first column of the same set file.
The two composite fields will be concatenated with the use of a feature called TEMPLATES. A TEMPLATE is defined with a /TEMPLATE statement, and then it is referenced in the /FIELD statement with the STRUCTURE parameter.
Note that we could add more items to the client information records, but this will suffice to demonstrate several techniques that can be used in the generation process.
Next, we need to decide how many client records must be generated. For this example, we will only generate 15 records, but we could just as easily generate 500, 1,000,000, or billions of rows. We want unique records with pipe-delimited fields for easier reading and database loading. We could also save them in tab-delimited format, with one or more fields for use in other RowGen test sets, like this one. Or we could do both at once. In fact, there is no limit to the number or format of RowGen target files or tables defined in the same generation script and job.
Specifying the Job
Every RowGen job script has at least three sections:
- Input, where the initial fields are generated. Either specify a data type for random value generation, or a set file (and lookup method) for random value selection. The number of rows to generate is also defined here.
- Action, where keys are defined if the test data is being sorted. We will sort on the account number, and specify /NODUPLICATES so that every client has a unique account number. As a result, however, it is possible the final number of records will be less than the number we chose to generate.
- Output, which consists one or more target files, tables, and/or custom reports.
Below is the RowGen job script for creating the test customer data. /FIELD statements are in parentheses. There will be a field name, data TYPE, field number designated by POS, and delimiter designated by SEP. Other parameters are defined after that, such as set files. PRECISION indicates the number of decimal places for a NUMERIC TYPE.
/INFILE=random_file_placeholder /PROCESS=RANDOM /INCOLLECT=15 # generate 15 fields /FIELD=(Acct_CODE, TYPE=ALPHA, POSITION=1, SEPARATOR="|", SET={ A,B,C,D,E }) /FIELD=(Acct_Digits, TYPE=DIGIT, POS=2, SEP="|", SIZE=15) /FIELD=(Trans_Limit, TYPE=NUMERIC, POS=3, SEP="|", PRECISION=0, SET=”limit.set” SELECT=ANY) /FIELD=(LNAME1, TYPE=ASCII, POS=4, SEP="|", SET="names_last.set") /FIELD=(FNAME1, TYPE=ASCII, POS=5, SEP="|", SET="names_first.set") /FIELD=(PBDAY, TYPE=ISO_DATE, POS=6, SEP="|", SET=” P_Bday.set” SELECT=ANY) /FIELD=(NAME_CODE, TYPE=DIGIT, POS=7, SEP="|",SET={ 0,0,0,1 }) /FIELD=(FNAME2, TYPE=ALPHA, POS=8, SEP="|", SET="names_first.set") /FIELD=(SBDAY, TYPE=ISO_DATE, POS=9, SEP="|", SET= “S_Bday.set” SELECT=ANY) /FIELD=(STREET_NUM, TYPE=NUMERIC, POS=10, SEP="|", PRECISION=0, SET={ [1,99999] }) /FIELD=(STREET_NAME, TYPE=ASCII, POS=11, SEP="|", SET="street_names.set" SELECT=ANY) /FIELD=(STREET_TYPE, TYPE=ASCII, POS=12, SEP="|", SET=".street_types.set" SELECT=ANY) /FIELD=(CITY, TYPE=ASCII, POS=13, SEP="|", SET="state_2char_city.set" SELECT=ROW[2]) /FIELD=(STATE, TYPE=ALPHA_DIGIT, POS=14, SEP="|", SET="state_2char_city.set" SELECT=ROW[1]) /NODUPLICATES # Make sure each record is unique with respect to the Account Number /KEY= Acct_CODE /KEY= Acct_Digits /OUTFILE=client.dat /PROCESS=RECORD /TEMPLATE=(AcctNum="%s%s") # Concatenates 2 string fields /TEMPLATE=(Address="%s %s %s") # Concatenates 3 string fields with a space between each /FIELD=(AcctNumber, TYPE=ASCII, POS=1, SEP="|", STRUCTURE=AcctNum(Acct_CODE,Acct_Digits)) /FIELD=(Trans_Limit, TYPE=NUMERIC, POS=2, SEP="|", PRECISION=0) /FIELD=(LNAME1, TYPE=ASCII, POS=3, SEP="|") /FIELD=(FNAME1, TYPE=ASCII, POS=4, SEP="|") /FIELD=(PBDAY, TYPE=ISO_DATE, POS=5, SEP="|") /FIELD=(LNAME2, TYPE=ASCII, POS=6, SEP="|", IF NAME_CODE EQ 0 THEN LNAME1 ELSE SET="names_last.set") /FIELD=(FNAME2, TYPE=ALPHA, POS=7, SEP="|") /FIELD=(SBDAY, TYPE=ISO_DATE, POS=8, SEP="|") /FIELD=(StreetAddress, TYPE=ASCII, POS=9, SEP="|", STRUCTURE=Address(STREET_NUM,STREET_NAME,STREET_TYPE)) /FIELD=(CITY, TYPE=ASCII, POS=10, SEP="|") /FIELD=(STATE, TYPE=ALPHA_DIGIT, POS=11, SEP="|")
You can run this script in the IRI Workbench or on the command line with:
rowgen /specification=genclient.rcl
Using the Results
This will generate output similar to the following:
A041228567690798|15|Huddleston|Jarrod|1921-04-04|Beltz|Rudolf|1982-08-06|23484 Hilda Heights|Owensville|MO A284391085036017|19|Rubinstein|Romona|1995-03-27|Rubinstein|Phyliss|1984-12-16|91115 Red Bird Trace|Lakeport|CA A554707733795739|1|Wagnon|Von|1944-05-26|Wagnon|Rona|1997-03-18|79903 Blue Ridge Street|San Jose|CA A940444121953751|13|Borum|Elvira|1993-01-11|Burney|Shelby|1940-01-09|53835 Adams Artery|Fredonia|NY B498416663279606|13|Galvin|Sanjuanita|1980-08-03|Galvin|Alvin|1989-02-07|37182 Outlaw Boulevard|Ponca City|OK B665089089214940|10|Brinker|Lucie|1962-04-27|Brinker|Kelley|1985-10-28|41512 Rosemary Trace|Bellmawr|NJ C092836176730933|16|Guillotte|Von|1984-03-14|Guillotte|Peggie|1988-12-17|50676 Whitley Avenue|Marlborough|MA C346991978386191|13|Manjarrez|Harriet|1996-06-08|Manjarrez|Karrie|1999-03-28|88221 North Chase Corridor|Plainview|TX C744423272297425|2|Olsson|Claudia|1975-01-07|Olsson|Lavern|1994-10-21|47048 Michael Trace|Salmon|ID C847549307203805|4|Spradley|Concepcion|1917-10-21|Spradley|Ahmad|1972-11-09|7045 Sanderson Loop|Wallace|NC C865940707690079|1|Jonson|Alonso|1990-04-03|Jonson|Jared|1965-07-16|62832 Baker Chapel Corridor|Lake Stevens|WA D552053977641891|18|Murdock|Wanda|1960-10-22|Murdock|Ann|1915-04-04|48010 Gracie Grove|Moultrie|GA D889743001706813|5|Shuey|Jeff|1978-06-28|Shuey|Monty|1953-10-08|6358 Elkin Avenue|Streamwood|IL E229826294718256|1|Shuford|Freida|1955-04-06|Shuford|Von|1920-10-02|96925 Leslie Village|Nashville|AR E864261095027054|2|Hocking|Nicola|1976-05-12|Henrickson|Josh|1967-06-20|34785 Squire Course|Palm Desert|CA
From this initial set file, I now have the basis of a transaction system or database. I can load the data in this set file into a table where the RowGen GUI can be used to build more sets from columns. With values the same in the other tables, referential integrity is established. So if you need to test a database performing joins, IRI CoSort (SortCL) scripts to capture changes in files, or something else, you can use RowGen to prototype a relational application.
For example, my system might use the account numbers in this file in a billing system. For test purposes, we can generate a master product file in a similar manner to the client information file above. A third transaction file (or table) may contain a product SKU (or medical billing code or bank fee) from the second dimensional file, along with the account number from the first file. When a purchase is made by a person with an account number in my base file, the billing application can invoice the address associated with that person.