How to Create a Custom Data Class Validator
Abstract: This is the second of a two-part series detailing data class validation in IRI Workbench. The first article, here, provides an overview of the validation scripts and how to use them in a data discovery or classification job. This article shows how to create a custom validation script for a special data class or group.
In this article, we will create and format a credit card validation script for use in a custom data class. It should be noted that IRI Workbench already provides a credit card data class and validation script for your convenience.
As a prerequisite, you should probably be familiar with ES5 JavaScript and Java 8. To follow along, you will need an IDE or text editor that supports JavaScript 5 (ES5) and Java 8.
I will be using Visual Studio Code, an open source IDE from Microsoft for this section of the tutorial. Although I won’t be going into detail on how to set up Visual Studio Code, you can find more information about the setup process here and here.
How IRI Workbench Interprets and Uses the Code
Before we get into the tutorial, it might be helpful to give a brief overview on how IRI Workbench platform interprets a validation script. When a JavaScript file is uploaded into a custom data class, Workbench will attempt to run the code through the use of the Java ScriptingEngine API.
The ScriptingEngine will then make the JavaScript file implement a validation interface that contains a method called validate. An example of the validation interface can be seen in the below image (written in Java):
This will search the JavaScript file for a function of the same name and make it executable in Java. For your convenience, the image below displays sample Java code calling and executing a method contained in the validation script.
Limitations of the Java Scripting Engine API
The Java Scripting Engine API utilizes the Nashorn engine for interpreting JavaScript code. With this, there are a few notable limitations to keep in mind when creating your script:
- The engine only implements the ECMAScript 5.1 Specification. ES6 syntax is not supported.
- The Nashorn engine does not have a console object. Running a script with console.log(“Hello World”) will throw an error. Use the Nashorn print function instead. For example, using print(“Hello”, “World”) will print its arguments to standardize out.
Step 1: Create the File
To get started, open up your preferred text editor and create a new JavaScript file. In the image below you can see I created a JavaScript file named validator-creditcard.js.
Step 2: Define the Validate Method
The JavaScript file must have a function named validate to work properly. This function will take in a single argument and will return either true or false.
You can consider this the most important function within the script since it will be the one invoked by IRI Workbench. Therefore, all validation logic should be contained in this function.
Step 3: Write the Logic
Logic will vary depending on the data you’re working on. For credit cards, the only validation logic that will need to be performed is a simple checksum using the Luhn Algorithm.
I won’t be going into detail on how to implement this algorithm but a good example can be found here. In the image below, you can see I implemented the validation logic using a helper function.
A few things to note:
- The input argument will always be a String
- The return value must be either true or false.
You may be wondering why the function is void of any pattern matching. That’s because IRI Workbench has a separate field for uploading patterns (more on this in the next section). It will run your provided pattern first and then run the validation script.
Adding a Validation Script to a Data Matcher
Validation scripts are used during the data classification process, and are assigned inside Data Matchers. Data Matchers are used in the data classification process to match data, versus Location Matchers which match against the structure/metadata of a source. Currently, the only type of Data Matcher that supports the use of validation scripts is pattern matchers.
To assign a validation script to a pattern matcher we can either create a new pattern matcher or edit an existing pattern matcher in a Data Class from the Data Class & Rule Library. In the example above, there is a data class called Credit_Card_Char.
This data class contains a data matcher that uses a pattern to match credit card numbers. Clicking Edit … will open the dialog to edit the parameters of the data matcher.
On this page, I added the file path to a validation script. Click OK to close the dialog. Now the Data Matcher stored inside the Credit_Card_Char data class will match credit card numbers based on a pattern and then be validated using the validation script provided.
If you have any questions about how to classify data for IRI Workbench-supported software like FieldShield, DarkShield or CellShield EE (or Voracity which includes them), email support@iri.com.