Redaction Options for Data Privacy
Editors Note: This article refers only to full or partial column/field redaction of strings in structured database or flat files, and thus one of many static (and API-driven dynamic) data masking functions available in IRI FieldShield. The same redaction features are available in IRI CellShield for strings in Excel spreadsheets, and IRI DarkShield for data in unstructured sources like text files, NoSQL DBs, documents, image files, etc.
Redaction often conjures up thoughts of government documents being reviewed by members of the public. Words, sections, and sometimes entire pages are blacked out so that classified or protected information is shielded. These sanitized documents have information obscured prior to their release.
Redaction by definition is the process or act of redacting a document, or a term for a work that has been redacted. To redact then is further defined as (1) to frame a subject in writing, (2) to select or adapt text for publishing or (3) to obscure or remove text from a document prior to release. This last definition is the most common form when it comes to everyday use in big data processing and data-centric protection. Users are looking for ways to protect important data, and redaction is one possible method to secure that protection.
There are numerous industries that rely on processing, storing and reporting on various types of data that are imperative to their companies. Those same pieces of data though can often reveal sensitive information that need to be protected. For any industry or government agency, they can be information about their customers, transactions, or business affairs. Sensitive data need to be protected either way, and one of those methods of protection is redaction.
With IRI FieldShield, you can perform the following types of field-level redactions for sensitive data stored in flat files and database tables:
1) Pre-defined masks, such as the redaction of all but the final four digits of a social security number.
2) Define your own mask, where you can customize the offsets, lengths, and redaction characters for a given field.
3) Conditional redaction, where you can redact field values only if a given condition(s) is met.
4) Column omission, where you do not produce the entire field (or record) on output.
Consider this CSV data set from a file called patient_info1.csv:
value,name,age,email,phone,ssn,city,state,birth_date,end_date "6644232","Adams, John","68","adams@gmail.com","646-834-9956","770-75-3849","Melbourne","Florida","1997-06-11","2000-10-19" "68346890","Jefferson, Thomas","22","Jefferson@Thomas.com","321-890-8293","345-78-3485","Goodland","Kansas","1996-05-05","2007-05-29" "8783945","Madison, James","44","Madison@James.com","563-435-7821","643-34-3478","Biloxi","Mississippi","2010-05-01","2010-05-17" "94967295","Monroe, James","52","Monroe@James.com","433-758-2783","123-85-4783","Rapid City","South Dakota","2003-05-20","2007-05-16" "49600245","Jackson, Andrew","90","Jackson@Andrew.com","145-894-4328","335-99-6655","Long Island","New York","1970-05-12","2008-05-09" "8563853","Polk, James","77","Polk@James.com","553-563-2399","294-90-2388","Baton Rouge","Louisiana","1999-05-19","2008-05-26" "43724","Chalse, Logan","38","Chalse@Logan.com","321-889-4633","770-73-8800","Melbourne","Florida","2008-06-06","2009-06-23"
Changes can be made to this data file using wizards and dialogs in FieldShield GUI (IRI Workbench), built on Eclipse™. Fieldshield users can specify one or more redaction requirements at the field level.
The following examples demonstrate the four different field-level redaction types previously mentioned:
Pre-defined Mask – The following screenshot shows the selection of a pre-defined mask for social security numbers:
Define Your Own Mask – The following screenshot shows how you can customize your own redaction criteria for a given field, in this case the birth_date field, where ^ is specified as the redaction character:
Conditional Redaction – Conditional redaction for the City field can be achieved with the following logic, which is easily performed with the Expression Builder within the Workbench:
IF STATE EQ “Florida” OR STATE EQ “New York” THEN MASK_CITY ELSE CITY
Column Omission – The entire phone number field is omitted in the output, and the position of the output fields was automatically re-numbered to compensate for the missing field within the record.
As a result of these four different redaction types, the input file would be manipulated to protect sensitive data. The following output reflects all the operations described above, to protect which was generated in one pass through the data using FieldShield:
6644232,Adams, John,68,adams@gmail.com,***-**-3849,*********,Florida,^^^^-06-11,2000-10-19 68346890,Jefferson, Thomas,22,Jefferson@Thomas.com,***-**-3485,Goodland,Kansas,^^^^-05-05,2007-05-29 8783945,Madison, James,44,Madison@James.com,***-**-3478,Biloxi,Mississippi,^^^^-05-01,2010-05-17 94967295,Monroe, James,52,Monroe@James.com,***-**-4783,Rapid City,South Dakota,^^^^-05-20,2007-05-16 49600245,Jackson, Andrew,90,Jackson@Andrew.com,***-**-6655,***********,New York,^^^^-05-12,2008-05-09 8563853,Polk, James,77,Polk@James.com,***-**-2388,Baton Rouge,Louisiana,^^^^-05-19,2008-05-26 43724,Chalse, Logan,38,Chalse@Logan.com,***-**-8800,*********,Florida,^^^^-06-06,2009-06-23