Generating Test NID Data: United States Social Security Numbers
The Social Security Number (SSN) in the United States is similar to many other forms of national identification around the world, such that it is a number or set of characters that is designated to a citizen or person who has entered the country for an extended stay on a visa. The US SSN consists of nine numbers in the format:
AAA-GG-SSSS
The first three numbers are an area number that is generated based upon the state of birth. The next two numbers are called the “group” number and are generated as a more specific number for the state. The last four numbers are called “serial” numbers which further specify the region of birth, and range from 0001 and to 9999.
IRI has created a new library function for RowGen users who wish to generate either valid or invalid SSNs. When creating a valid SSN that is formatted correctly, the user can call a special RowGen test data generation function, natid_gen_us in the output field. It is used in the RowGen job script:
/INFILE=RowGen_SSN_valid.placeholder.in /ALIAS=RGSSN /PROCESS=RANDOM /INCOLLECT=1 # Number of records to produce /FIELD=(NATID=natid_gen_us("FL"), TYPE=ASCII, POSITION=1, SEPARATOR="\n") /REPORT /OUTFILE=stdout /PROCESS=RECORD /FIELD=(NATID, TYPE=ASCII, POSITION=1, SEPARATOR="\t")
The output from this routine is:
26X-56-3XXX
Note:
- FieldName is the desired name for the field.
- State is the two-character abbreviation (e.g. FL for Florida) for the specific state the user would like to use to generate the SSN; these characters must be capitalized when entered.
- Leaving the State parameter out of the function call will randomly generate a US SSN
- The /INCOLLECT value function determines the number of test records created.
- The 4 X’s above are actual numbers concealed to protect the particular person associated with the actual SSN generated.
To generate an invalid US Social Security Number, so as to protect citizens from identity fraud, you can use special rules, or value ranges used in SSN-related advertising. A RowGen job script calling this function to create a properly formatted, but invalid US SSN might be:
/INFILE=RowGen_SSN_invalid.placeholder.in /ALIAS=RGSSN /PROCESS=RANDOM /INCOLLECT=1 # Number of records to produce /FIELD=(NATID=natid_gen_invalid_us(), TYPE=ASCII, POSITION=1, SEPARATOR="\n")
The output for this routine is:
987-65-4325
Note:
- FieldName is the desired name for the field.
- There are no parameters for this function because it randomly generates the numbers using advertising-related rules for creating US SSNs.
Any number of records can be generated and sent to one or more output file or table targets, in one or more (custom) formats with or without other data fields. The other fields can either be randomly generated, or randomly selected from real data, to provide realism and privacy together. If you are interested in using either of these functions in RowGen, please contact your IRI representative.