Data Validation in IRI Software
Data validation is a process that ensures a program operates with clean, correct and useful data. It uses routines known as validation rules that systematically check for correctness and meaningfulness of data that are entered into the system. The rules are usually automated by either commands, data validation logic, or a library of applied definitions like a data dictionary. Data validation is important because users can experience security vulnerability or data corruption without it.
IRI products offer several methods to check for and discard, or otherwise flag, data that do not meet validation criteria. Meeting or failing data validation tests can affect how individual fields are displayed, or whether entire records are included or omitted in the target. The following are examples of the application of data validation functionality within those IRI’s products that can format data for hand-offs and reports, i.e., CoSort for data transformation and reporting, NextForm for data (and DB) migration, FieldShield for data masking, and Voracity for data discovery, integration, migration, governance and analytics.
IF THEN ELSE Validation
1. Use the expression builder in the IRI Workbench to populate an IF THEN expression that creates the following CoSort Sort Control Language (SortCL) job script entry that flags and omits individual field values:
/FIELD=(VALID_AGE, ASCII, POSITION=1, SEPARATOR=”,”, IF AGE < 18 THEN “Underage” ELSE AGE)
The AGE source field is valid when the value is 18 or higher. Ages will be flagged as “Underage” for minors, otherwise the original source data values will be displayed.
2. In the expression builder, you can also generate scripts that looks for exact string matches, or determine whether a field simply contains a certain string:
/FIELD=(ID_NO_FLAG, ASCII, POSITION=2, SEPARATOR=”,”, IF ID_NO CT “999” THEN “Invalid” Else “Valid”)
The ID_NO field is valid only when it does not contain three 9s in a row anywhere throughout the field. The above script entry flags fields as “valid” or “invalid” based on this string matching criterion.
3. You can incorporate one of several native functions within IF THEN ELSE logic, too. For example,
/FIELD=(VALID_CREDIT_CARD_LENGTH, ASCII, POSITION=3, SEPARATOR=”,”, IF length(CREDIT_CARD) NE 16 THEN “Error” ELSE “”)
The above uses the length() function for data validation to ensure whether the credit card value has a size of 16. If the field length does not equal (NE) 16, then the field is flagged with “Error”. Â Otherwise the credit card value field is considered valid and, in this example, is not displayed (ELSE “”).
In addition to the length() function, there are several other native functions that can be used to evaluate data fields for data validation purposes, all of which are accessible via the expressions builder shown below.
The following ‘iscompare’ functions are available for CoSort and FieldShield to build useful expressions for data validation:
C-Style Validation
- isalphadigit(field)
Equivalent to the C library test named isalnum. True if all characters are alphanumeric characters. This is equivalent to (isalpha(field) || isdigit(field)). - isalpha(field)
True if all characters are alphabetic characters, in the current locale. This is equivalent to (isupper(field) or islower(field)). In some locales, there might be additional characters for which isalpha(field) is true, that is, there can be letters that are neither upper-case nor lower-case. - isascii(field)
True if each character is a 7-bit unsigned char value that fits into the ASCII character set. - iscntrl(field)
True if all characters are control characters. - isdigit(field)
True if all characters are digits (0 through 9). - isgraph(field)
True if all characters are printable (except space). - islower(field)
True if all characters are lower-case. - isprint(field)
True if all characters are printable (including space). - ispunct(field)
True if all printable characters, except a space or an alphanumeric.
Non-C Style Validation
- isempty(field)
Returns true for null fields or those that satisfy isspace(field). - isnumeric(field)
Same as isdigit(field), but also recognizes period (.), plus (+), and minus (-). At least one character must be a digit. - isebcalpha(field) True if all characters are EBCDIC alphabetic.
- isebcdigit(field)
True if all characters are EBCDIC digits. - isholding(value1,value2)
True if value2 is contained within value1. value1 and/or value2 can be a literal value or a field name, for example isholding(ACCOUNT,” # “). - ispattern(field,”expression”)
Checks the field using Perl-compatible regular expressions such as a+bc. - ispacked(field)
Checks the field to make sure each nibble, except for the last one, contains a 0-9 value, and that the last nibble contains a hex b, c, d, or f.
Selection Logic Validation
The IRI Workbench can execute data validation through the Include/Omit dialog and the “Where Clause” expressions builder to form /INCLUDE or /OMIT statements (or to reference previously defined conditions, by name) that filter entire records based on data validation criteria. This will generate statements in the job script. Look at this example:
/INCLUDE WHERE AGE >= 18
When applied to the input section of a job, the above statement will ensure that only those records where the AGE value is 18 or above will be processed and displayed on output. It is recommended that record filtering is performed in the input section, where possible (i.e., when multiple output files with different filtering requirements are not required) so that processing time is not wasted on records that are not required in the target. Note that all of the data validation techniques/examples mentioned above can be applied within INCLUDE and OMIT statements, as well as at the field level.
So these are some examples of how data validation can be performed in the IRI Workbench GUI, an Integrated Development Environment (IDE) built on Eclipse. If you need help, contact support@iri.com.