How to Mask PII in a Database

by David Friedland

First released in 2011, the IRI FieldShield database data masking tool helps you classify, find, and de-identify personally identifiable information (PII) or other sensitive data in structured relational database (RDB) sources. FieldShield also supports fixed and delimited files, flat XML and JSON, Excel, and ASN.1 CDR files. IRI DarkShield supports RDBs and other sources, too. ¹

This article only summarizes the steps FieldShield users normally take to connect to, classify, discover, mask, and audit PII in a relational database. Below you will see links to other articles in the IRI blog which detail each of these steps. ²

This diagram represents the most common workflow:

Step-by-Step

By way of more detail, FieldShield users typically follow these steps:

1. Once you have installed IRI Workbench, and licensed the FieldShield masking executable, make both JDBC and ODBC connections to each database to be masked. For files, identify local or remote folder locations.

2. Use wizards in the data discovery menu to profile your data sources, build E-R diagrams to model RDB schema, perform referential integrity checks, and search for specific items matching a pattern. You can find much more however using data classification (see below).

3. Define data classes (e.g., passport #, name, phone), or class groups (e.g., citizen PII) as needed, and assign to them search methods and masking functions. Then run the search wizard for your DB schema (or file folders). The search jobs produce log reports, dashboard charts, and a Data Class Map you can review before the bulk data masking wizards use it to consistently apply your rules to your data classes (which ensures referential integrity).

4. Run the New Data Class Map DB Masking Job … or the New Data Class Map File Masking Job … wizard. You will apply a default or chosen data masking function for each data class based on its need for reversibility, realism, uniqueness, consistency, and security. You can also configure these data masking jobs in mapping diagrams, 4GL scripts, or an IRI API.

5. Jobs once created are easily run and modified for iterative testing, where output can first be virtualized to the console, file or sample tables for review, modification, sharing, and re-use. SQL query and update logic can be inside scripts or workflows for real time or incremental masking jobs.

6. An included risk-scoring wizard statistically measures the likelihood that a masked data set (table) can still be used to identify an individual based on quasi-identifying information in the row.

7. Further masking jobs can address the risks exposed in the scoring report by blurring or bucketing quasi-identifying values like birthdate and age. These techniques render the data anonymous, but still fit for purpose.

8. As you gain success and confidence in the above processes, and begin to satisfy production data masking needs, you should also take stock of your job performance and metadata assets (source layouts, task and batch scripts, encryption keys, etc.). Use an IRI Workbench plugin like Git to secure, share, version control, and track changes to your metadata.

9. During and after the go-live dates, follow IRI recommendations to monitor performance, manage updates to data, conditions and software and address new use cases or audit requirements.

You can mask data in your production or test schema; you define the sources, targets, and flow. The most common is ETL; i.e., read from production, masking in the engine and writing to test.

Watch this video of a typical job: https://www.youtube.com/watch?v=ALwQA9OigK8&ab_channel=IRI-TotalDataManagement

What’s Next?

After FieldShield masking is performed, you can also:

Validate the results in your target tables for privacy, realism, and referential integrity
Audit the discovery (search), data masking job, and performance logs
Automate the batch via the IRI (or your) job scheduler or integrate it into a CI/CD pipeline
Perform re-ID risk-scoring (usually for HIPAA expert determination method compliance)
Build and run additional anonymization jobs to further obscure quasi-identifiers
Add additional targets, filters, transforms, DQ functions, or formatting to any of the jobs
Use its data classes and masking rules in a DB subsetting, Ripcurrrent, or DarkShield job
Share, version control, change-track and otherwise secure any job artifacts in Git

If you have any questions or need help masking your database, email fieldshield@iri.com.

If you have semi-structured or unstructured data in your RDB, or you are interested in finding and masking PII in NoSQL DBs, EDI files, PDF and MS documents, or image files, check out IRI DarkShield. DarkShield uses the same GUI, data classes, and masking rules as FieldShield to provide data integrity on-premise or in the cloud enterprise wide. You can also use DarkShield for RDB sources as well; see this FAQ for guidance vs. FieldShield.
There is also help within the IRI Workbench GUI for FieldShield, YouTube videos, and product manual. Self-directed and IRI-supervised learning services are also available to expedite your solution.