Data Masking in DB Applications
Protecting Data at Rest and in Motion
Database applications that update and query tables may need to secure data going into, or being retrieved from, those tables. Sensitive data must be protected on the way into the table, once in the table, or on the way out. In every case, the goal is to keep unauthorized people from accessing certain rows or column values considered sensitive.
For normal form (structured) data in relational databases, the IRI FieldShield standalone data masking utility, or its compatible library of callable masking functions, can accommodate the scenarios above, providing many options for both static (persistent, at rest) and dynamic (in-transit) data masking options. If you also have PII/PHI values floating randomly within semi- and unstructured RDB columns (like text, C/BLOB, XML/JSON), see this article on the IRI DarkShield approach.
This article will talk FieldShield options, however, since most RDB users are concerned with finding and masking fixed column vales. FieldShield users can classify, locate, and mask data in DB tables, and for application users, run via:
- a static data masking on production tables, and dynamic unmasking by authorized app users
- masking functions embedded in metadata-compatible transform, cleansing, or reporting jobs
- a proxy-based dynamic data redaction system
- through a SDK API, or system, call from a program
- in-situ, via SQL procedures using a custom library
If your data are in NoSQL databases like MongoDB, Cassandra, Elasticsearch or MarkLogic, FieldShield would handle structured collections, while DarkShield would handle both structured and unstructured collections.
By default — and the way the IRI Workbench GUI presents FieldShield to end-users and DBAs — one or more full tables are typically secured with static data masking functions (like encryption, redaction, pseudonymization) according to the business rules. The choice of each ‘field (column) shield’ should be based on security, realism, reversibility, and perhaps CPU or storage considerations.
The static approach works fine when data are at rest. Either the source table can be protected, or a new target table or file with protections applied can be created. Applications which fetch the protected data need not be concerned about security because their data sources were pre-protected. FieldShield programs designed to protect data at rest can also be scheduled or called into batch jobs for regular updates.
However, in a real-time environment where updated rows need dynamic protection, FieldShield functions must be integrated into the application’s data flow. There are several approaches to consider:
1) Your Program Calls FieldShield
Database and other programs which fetch from, or send new or changed data into, tables can pass it into a FieldShield encryption, hashing, encoding, or redaction API function in C, Java, or .NET. You could also pass the data in motion into a standalone FieldShield program through a pipe or input procedure. Any of these methods can populate targets with column-level protect (or reveal) functions applied.
2) FieldShield Protects Only Updates
FieldShield programs can also be customized via /QUERY and /UPDATE functions, or to conditionally filter only new records, where the rows to be protected meet specific criteria, like newness. API calls would allows for even more granular business logic and facilitate more data ‘velocity’ (or latency) conditions — for example, more real-time data masking — because those needs can be expressed through the application and defined by its logic. See this PL/SQL example of real-time masking based on a trigger.
3) Capture and Protect Deltas in CoSort (FieldShield Parent)
Change data capture (CDC) jobs can also be programmed in CoSort’s Sort Control Language (SortCL), whereupon only selected inserts, updates, deletes or unchanged rows can be passed to tables or files with column protections applied as this happens. CoSort users have all FieldShield protection functions at their disposal, so it is possible mask only changed data in this bulk reporting paradigm.
4) Proxy-based Query Intercepts
IRI now provides a special “JDBC SQL Trail” driver for applications to use that filters DB queries for authorized users and particular tables and columns. Those not exempted to the DDM policy definition will only see fully or partially redacted values in flight to that specially connected application from the database, which remains unmasked at rest.
5) Custom Integrations
IRI can work with your DBA or application programmer to design a bespoke solution that involves elements of the above, or to provide FieldShield libraries your SQL procedures can invoke in-situ like these to protect or reveal (encrypt or decrypt) query results, query tables, materialized views, and so on.
Contact fieldshield@iri.com for help integrating a dynamic data masking function like redaction or format preserving encryption into your application.