Challenges
Personally identifiable information (PII) that is not needed for downstream use or recovery should be redacted (covered over with masking characters, or removed) to prevent misuse. Meanwhile, other data elements or selected parts of the PII value itself may still need to be exposed.
At the same time, the masking should preserve the original storage format and overall field appearance so that the platform (e.g., DB table) structure or application need not be altered. Masking all but the last four digits of a credit card or social security number with asterisks is a common data redaction requirement.
While DBAs can remove columns in tables, there are few alternatives for masking data in different ways across multiple databases and file sources.
Solutions
To redact data at rest, use the built-in functions in PII redaction software from IRI:
- IRI FieldShield for DBs and flat files
- IRI CellShield for Excel
- IRI DarkShield for unstructured files
- their parent SortCL data manipulation program in IRI CoSort or IRI Voracity for bigger structured and semi-structured sources.
To redact data in motion (dynamic data masking), use the replace_char API function in the FieldShield SDK
In each case, you can use the IRI product you need to search for, classify, and redact a specified number of bytes, defined (sub-)strings, entire fields, or one or more rows. Choose the type and start/stop locations of the masking characters. Determine whether to apply the redaction based on column names or data classes, pattern matches, field value conditions, NER models, etc.
Character masking is just one of the non-reversible protection functions provided in IRI software. Others include randomization, external source pseudonymization, and possibly hashing. Reversible protections include encryption, encoding, a proprietary ASCII de-ID function, same-source pseudonymization, expression logic, and string functions.
SortCL users have the additional option to transform and report on data as they redact it in whole or part.
All IRI software is supported in a free, familiar GUI built on Eclipse called IRI Workbench, which exposes and profiles multiple data sources, helps configure masking jobs, and manages the projects for team-sharing and compliance verification.
In addition to credit card encryption and tokenization options for PCI DSS applications, IRI also provides a convenient way to mask all or part of a 16-digit credit card number in database tables and flat files.
The IRI Workbench GUI for FieldShield can automatically mask all but the last 4 digits of the field in any database or flat file. You can also define another character and set of digits to mask in the same dialog:
This or any other data masking function can be applied ad hoc or as cross-table rule for en-masse protection.
To create computationally valid PANs for testing, use the IRI RowGen function described here:
In addition to encryption and other ways to protect National Identification (NID) numbers, IRI also provides conventional masks for popular NID formats.
The IRI Workbench GUI for FieldShield can automatically mask NID values in tables and flat files using this dialog:
Results from these functions are shown in the Tech Tip article in this IRI Newsletter. If you are interested in generating computationally valid NID test data, see this section of the IRI blog.
In addition to encryption and other ways to protect Social Security Numbers (SSNs), IRI also provides conventional and custom masking options for this 9-digit format.
The IRI Workbench GUI for FieldShield can automatically mask SSN field values in tables and flat files using this dialog:
Data masking function can be applied ad hoc, or as a cross-table rule for en-masse protection.
Use the IRI RowGen test data product to generate valid and invalid social security numbers:
If you manage personally identifiable data in database tables or flat files, you can select which source columns and values will remain or display in your target(s). Populate tables, reports, and hand-off files with data provided on a need-to-know basis.
With either IRI Voracity, IRI FieldShield or the SortCL program in IRI CoSort, you can selectively omit or mask parts of rows or columns from inputs or outputs based on conditional evaluation logic. That is, either filter out or cover over the field values or component parts (sub-strings or particular bytes) based on your business rules.
If the sensitive data reside in unstructured text files, IRI software can help you find, structure, and place that information in protected, structured targets.
Additional FieldShield functions like encryption, de-identification, and obfuscation are also possible on other fields at the same time. CoSort SortCL users can also redact and mask data during data transformation, migration, and reporting operations.
If you have data in unstructured text files, Microsoft Office documents, .pdf, .rtf, or .html files, images files, or faces, use the IRI Workbench dark data discovery wizard to find (and extract) data that matches patterns or dictionary values, or are otherwise in a JSON key name, area bounding box, or NER NLP model. IRI partner technology does the rest.
The IRI Voracity total data management platform can redact PII at the field-level in HDFS files, cloud apps, and other big data (Hive, NoSQL, etc.) sources.
Other Resources