Masking Floating PII in Database Columns
Calling DarkShield from FieldShield: Reaching the DarkShield API with a Field-Level Transformation Function
Overview
It is now possible to use an IRI DarkShield function from within an IRI FieldShield job to find and mask one or more instances of PII floating inside the text of specified columns. This feature allows you to leverage the best of both data masking tools in certain use cases, particularly when the functional extensibility of FieldShield through SortCL scripting is required. 1
This new combination works through a new SortCL field-level transformation function called darkshield_api_mask. The function can search and mask the contents of semi-structured or unstructured data within a relational database (RDB) column or flat-file field by making requests to the Plankton (DarkShield RPC) API.
The function has four required arguments, and one optional argument:
- The name of a field
- The name of an existing File Search Context (FSC)
- The name of an existing File Mask Context (FMC)
- The URL of a running instance of the DarkShield API.
- (Optional) A content-type to include in the request. The DarkShield API does automatic content-type detection, so this is only necessary when DarkShield’s automatic detection is unable to discern the content type, such as in the case of an HL7 message, which cannot be distinguished from plain text.
The DarkShield function call will send the contents of the field (column value) to the endpoint provided by the DarkShield-Files plugin of the DarkShield API for searching and masking a file in one pass. It will then return the masked contents based on search matchers, masking rules, filters, and configuration options defined in the file search and file mask contexts referenced by the function.
This article explains how to configure those settings and shows sample FieldShield job script invocations of the function.
Prerequisites
- A running license of IRI FieldShield or Voracity
- A running instance of the DarkShield API
- IRI CoSort build tag 240105 or later
- Text search contexts and mask contexts must be set up (with names) to define search matching methods and pair each matcher with a masking rule. Separate file search contexts and file mask contexts must also be specified. A file search context can add file-type-specific search methods or options, such as JSON path matchers. See this article for more details on search and mask contexts, and this article for more details on file search and mask contexts, which are separate pairs of contexts. Both pairs of contexts are part of the request body sent to the DarkShield API through endpoints like /api/darkshield/searchContext.create and/api/darkshield/files/fileMaskContext.create.
Limitations
- SortCL currently has an addressable record (row) length limit of 64KB, so any data beyond that position cannot be mapped to a field, nor thus masked in FieldShield.
- This function currently supports only textual data (free text, XML, JSON, etc.). There is some simple escaping/unescaping of newlines currently implemented.
- If base 64 encoding/decoding were to be used instead, it would be possible to handle binary data. However, PDFs and images often would likely run up against the record length limit.
If your RDB has longer rows or binary data in BLOB or CLOB columns for example, forgo FieldShield in favor of DarkShield, and use the DarkShield RDB search/mask job wizard, or write a program to call the DarkShield-RDB RPC API to mask all the DB data the same way.
FieldShield Job Script Examples Utilizing the Field-Level DarkShield Function
1. Masking an Unstructured Text File
This example takes an unstructured text file and sends each line as text to be searched and masked by the DarkShield API. The length of each line must not exceed 64 KB.
Unmasked Input File
FieldShield Job Script
/INFILE=example.txt /FIELD=(FIELD1, SEPARATOR="\t", position=1) /REPORT /OUTFILE=output /FIELD=(FIELD1_mask=darkshield_api_mask(FIELD1, "FSC", "FMC", "http://localhost:8959/api/darkshield/files/fileSearchContext.mask", "text/plain"), position=1, separator="\t") # FSC- File Search Context name # FMC- File Mask Context name # URL of DarkShield API endpoint # Optional content-type
This job uses these file search and file mask contexts already created (defined as follows):
Files API Search Context
{ "name": “FSC”, "matchers": [ { "name": “SearchContext”, "type": "searchContext" } ] }
Files API Mask Context
{ "name": “FMC”, "rules": [ { "name": “MaskContext”, "type": "maskContext" } ] }
These file API contexts above reference the search and mask contexts from the base API below to define a search matcher for SSNs and email addresses (based on regular expression patterns), and a hashing rule for emails and a partial string redaction rule for SSNs:
Base API Search Context
{ "name": “SearchContext, "matchers": [ { "name": "SsnMatcher", "type": "pattern", "pattern": "\b(\d{3}[-]?\d{2}[-]?\d{4})\b" }, { "name": "EmailMatcher", "type": "pattern", "pattern": "\b[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,4}\b" }, ] }
Base API Mask Context
{ "name": “MaskContext”, "rules": [ { "name": "HashEmailRule", "type": "cosort", "expression": "hash_sha2(${EMAIL})" }, { "name": "RedactSsnRule", "type": "cosort", "expression": "replace_chars(${SSN},'*',1,3,'*',5,2)" } ], "ruleMatchers": [ { "name": "EmailRuleMatcher", "type": "name", "rule": "HashEmailRule", "pattern": "EmailMatcher" }, { "name": "SsnRuleMatcher", "type": "name", "rule": "RedactSsnRule", "pattern": "SsnMatcher" } ] }
Masked Output File
2. Masking PII in an XML Column of a RDB Table
This example reads data from a database table with a column that contains XML, and writes the masked data to a target table in another schema with the same structure.
The ‘datum’ column contains the XML. This column has the DarkShield function applied, which will send the data from that column to the DarkShield API for searching and masking. The XML content-type allows for file-type-specific search matchers and filters like XML paths to be used.
Source Table
FieldShield Job Script
/INFILES="devonk.tester2;DSN=localSQLSERVER;EUID=qq0zq/IvfdnDyH1FLfcytw==;EPWD=m9xgjH0iJvBAGf7oyn418Q==;" /PROCESS=ODBC /ALIAS=LOCALSQLSERVER_dbo_transactions /FIELD=(ID, TYPE=ASCII, POSITION=1, SEPARATOR="\t", ODEF="ID") /FIELD=(XML_COL, TYPE=ASCII, POSITION=2, SEPARATOR="\t", ODEF="datum") /REPORT /OUTFILE=out # direct mapping of input table to a flat file (no value changes) /OUTFILE="dbo.xml_masked;DSN=localSQLSERVER;EUID=qq0zq/IvfdnDyH1FLfcytw==;EPWD=m9xgjH0iJvBAGf7oyn418Q==;" /PROCESS=ODBC /CREATE /FIELD=(ID, TYPE=NUMERIC, PRECISION=0, POSITION=1, SEPARATOR="\t", ODEF="ID") /FIELD=(XML_MASKED=darkshield_api_mask(XML_COL, "FileSearchContext", "FileMaskContext", "http://localhost:8959/api/darkshield/files/fileSearchContext.mask", "application/xml"), TYPE=ASCII, POSITION=2, SEPARATOR="\t", ODEF="datum")
Once again, both sets of contexts are needed; i.e., first the file search and mask contexts, and then the base search and mask contexts they reference. The search context in this case specifies the discovery of phone numbers and emails through matches to regular expression patterns. The mask context pairs each search matcher to a data masking rule; in this case phone numbers are paired to a format-preserving encryption masking rule, while emails are paired with a hashing rule.
Files API Search Context
{ "Name": “FileSearchContext”, "matchers": [ { "name": “SearchContext”, "type": "searchContext" }, { "name": "NameMatcher", "type": "jsonPath", "jsonPath": "$..name" }, { "name": "NameMatcher", "type": "xmlPath", "xmlPath": "//name" } ] }
Files API Mask Context
{ "name": “FileMaskContext”, "rules": [ { "name": “MaskContext”, "type": "maskContext" } ], "configs": { "json": { "prettyPrint": true } } }
Base API Search Context
{ "name": “SearchContext”, "matchers": [ { "name": "EmailMatcher", "type": "pattern", "pattern": "\\b[\\w._%+-]+@[\\w.-]+\\.[A-Za-z]{2,4}\\b" }, { "name": "PhoneMatcher", "type": "pattern", "pattern": "\\b(\+?1?([ .-]?)?)?(\\(?([2-9]\d{2})\\)?([ .-]?)?)([2-9]\\d{2})([ .-]?)(\\d{4})(?: #?[eE][xX][tT]\\.? \\d{2,6})?\\b" }, { "name": "NameMatcher", "type": "ner", "modelUrl": “http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin”, "sentenceDetectorUrl": “http://opennlp.sourceforge.net/models-1.5/en-sent.bin”, "tokenizerUrl": “http://opennlp.sourceforge.net/models-1.5/en-token.bin” } ] }
Base API Mask Context
{ "name": “MaskContext”, "rules": [ { "name": "HashRule", "type": "cosort", "expression": "hash_sha2(${INPUT})" }, { "name": "FpeRule", "type": "cosort", "expression": "enc_fp_aes256_alphanum(${INPUT})" } ], "ruleMatchers": [ { "name": "FpeRuleMatcher", "type": "name", "rule": "FpeRule", "pattern": "PhoneMatcher|NameMatcher" }, { "name": "HashRuleMatcher", "type": "name", "rule": "HashRule", "pattern": "EmailMatcher" } ] }
Target Table
Using the IRI Workbench Rule Dialog
It is also possible to apply the DarkShield API call as a field-level data masking rule (associated with a data class) automatically applied to multiple tables or files through the data rule dialog in IRI Workbench:
Selecting the DarkShield API Call option will open up the specification options in this dialog page:
As you move through the argument blanks, a description of what to enter is provided, as shown above. Note that File Search and File Mask contexts names automatically subsume your previously defined base (text) API search and mask contexts so they do not have to be respecified here.
End Notes
Other possible common applications of this function not demonstrated here (but following a similar framework) include masking unstructured text like notes or emails in a database column, masking JSON in a database column, or masking only portions of a field.
A bad response from any request sent to the DarkShield API by this field-level function will result in an error code being assigned to the external transform structure (CE_EXT_ABORT) that gets propagated to the sortcl executable. An error message will appear at runtime in the FieldShield (SortCL) console.
If you are interested in using the FieldShield or DarkShield data masking tools, separately or in combination, email info@iri.com.
- See this FAQ comparing the FieldShield and DarkShield tools for database masking.