Database data masking

by David Friedland

Protecting Data at Rest and in Motion

Database applications that update and query tables may need to secure data going into, or being retrieved from, those tables. Sensitive data must be protected on the way into the table, once in the table, or on the way out. In every case, the goal is to keep unauthorized people from accessing certain rows or column values considered sensitive.

For normal form (structured) data in relational databases, the IRI FieldShield standalone data masking utility, or its compatible library of callable masking functions, can accommodate the scenarios above, providing many options for both static (persistent, at rest) and dynamic (in-transit) data masking options. If you also have PII/PHI values floating randomly within semi- and unstructured RDB columns (like text, C/BLOB, XML/JSON), see this article on the IRI DarkShield approach.

This article will talk FieldShield options, however, since most RDB users are concerned with finding and masking fixed column vales. FieldShield users can classify, locate, and mask data in DB tables, and run jobs via:

a static data masking on production tables, and dynamic unmasking by authorized app users
masking functions embedded in metadata-compatible transform, cleansing, or reporting jobs
CDC-based incremental data masking and replication to refresh test rows in real time
through a SDK API, or system, call from a database application program
in-situ, via SQL procedures using a custom library

If your data are in NoSQL databases like MongoDB, Cassandra, Elasticsearch or MarkLogic, FieldShield would handle structured collections, while DarkShield would handle both semi-structured and unstructured collections, too.

By default — and the way the IRI Workbench GUI presents FieldShield to end-users and DBAs — one or more full tables are typically secured with static data masking functions (like encryption, redaction, pseudonymization) according to the business rules. The choice of each ‘field (column) shield’ should be based on security, realism, reversibility, and perhaps CPU or storage considerations.

The static approach works fine when data are at rest. Either the source table can be protected, or a new target table or file with protections applied can be created.

Applications which fetch the protected data need not be concerned about security because their data sources were pre-protected. FieldShield programs designed to protect data at rest can also be scheduled or called into batch jobs for regular updates.

However, in a real-time environment where updated rows need dynamic protection, FieldShield functions must be integrated into the application’s data flow. There are several approaches to consider:

1) Your Program Calls FieldShield
Database and other programs which fetch from, or send new or changed data into, tables can pass it into a FieldShield encryption, hashing, encoding, or redaction API function in C, Java, or .NET. You could also pass the data in motion into a standalone FieldShield program through a pipe or input procedure. Any of these methods can populate targets with column-level protect (or reveal) functions applied.

2) FieldShield Protects Only Updates
FieldShield programs can also be customized via /QUERY and /UPDATE functions, or to conditionally filter only new records, where the rows to be protected meet specific criteria, like newness. API calls would allows for even more granular business logic and facilitate more data ‘velocity’ (or latency) conditions — for example, more real-time data masking — because those needs can be expressed through the application and defined by its logic. See this PL/SQL example of real-time masking based on a trigger.

3) Capture and Protect Deltas in CoSort (FieldShield Parent)
Change data capture (CDC) jobs with FieldShield masking rules can run automatically and incrementally in IRI Ripcurrent, or be programmed in CoSort’s Sort Control Language (SortCL) jobs. Either way, selected inserts, updates, deletes or unchanged rows pass to tables or files with column protections applied on schedule, ad hoc, or in real-time. CoSort users have all FieldShield data masking functions at their disposal, so it is also possible mask only changed data in this bulk reporting paradigm.

4) Custom Integrations
IRI can work with your DBA or application programmer to design a bespoke solution that involves elements of the above, or to provide FieldShield libraries your SQL procedures can invoke in-situ like these to protect or reveal (encrypt or decrypt) query results, query tables, materialized views, and so on.

Contact fieldshield@iri.com for help with masking your database data. Please tell us your data sources and whatever you can about your use case and requirements so we can identify and present the most relevant database data masking solution for you as soon as possible.

Selecting Personally Identifiable Information for Secure Queries (FieldShield Filters)

FieldShield v3: More Data Masking Functions and Cross-Table Protection Rules