PII Lookup File Searches
Editor’s Note: This article was writen in 2016. But since 2023, there are even more robust PII discovery capabilities for data in structured, semi-structured, and unstructured (RDB, NoSQL DB, file, document, and image) sources using IRI DarkShield, including Named Entity Recognition (NER) model matchers. Please see this article on the IRI Data Class & Rule Library (Data Classification) and its links to supported PII Location and Data Matchers. FieldShield also has new RDB schema and flat-file (directory) data class search wizards.
IRI provides multiple data discovery features for personally identifiable information (PII) and other sensitive or need-to-be-found data held in enterprise sources.
Beyond data class, pattern- and fuzzy-match searches described elsewhere in this blog, this article primarily discusses the search for values held in a lookup or ‘set’ file (e.g., a list of names). The feature is supported in the IRI FieldShield data masking product for databases and files, the IRI CellShield Enterprise Edition (EE) data masking product for Microsoft Excel spreadsheets, and the IRI Voracity platform for data lifecycle management.
Specifically, the new search capability is built into the wizards for database profiling, flat-file profiling, and dark data discovery in the IRI Workbench GUI (built on Eclipse), which supports FieldShield and all other IRI software. And, the same string-search feature was added to CellShield EE for masking data in Excel spreadsheets.
Value Searches in DBs & Files
To use this lookup feature in the database or flat-file profiling wizards, find the Column (DB profiler) or Field (file profiler wizard) Selection page, and select the check box for Expression Search. On the next page, select Values File from the Search Type list. Then browse to find and select the set file with the values to be searched. Complete the rest of the fields on the page, and click Finish.
Value Searches in Text, Documents and other Unstructured Data Sources
The Dark Data Discovery wizard can also find values in lookup files because they are text files, and that wizard can use multiple search methods to find data in any text file, Microsoft Office and PDF document, MongoDB or Cassandra, or in popular image file formats (even images embedded in documents). The wizard extracts and buckets both the values it finds and the metadata for the files in which those values are found into a delimited flat-file, or in an Excel Interchange File (EIF) for use in CellShield EE. Note however that this wizard is usually with the IRI DarkShield tool for finding and masking pre-classified PII hidden in unstructured data sources.
Value Searches in Excel
Alternatively, CellShield EE’s new Set File [based] Remediation feature can find values in any Excel 2010 or 2013 spreadsheet that exist in a set file, allowing you to mask those values via encryption, redaction, or pseudonymization. You upload the set file, choose the preferred protection function, and click “Remediate.” A popup lets you know when the operation is done.
Pseudonymization, by way of example, is a good way to de-identify names or other proper nouns while preserving realism in the target. Pseudonyms can be reversible, or not.
-
- In your worksheet, click the Import Set File icon in the CellShield ribbon to open the Set File Search utility.
- Browse to the set file with the names you are looking for, and try to load it. You will get an error if there’s a problem with the file — it must be a list of ASCII values delimited by a space or carriage return. Click OK to continue.
- For the Remediation Type here, we’re choosing Pseudonymization, though we could also choose redaction (full/partial cell), or encryption (AES 128, FPE AES 256, etc.), instead.
- Click on Find Matches, and the Menu gives a count of the matches found, highlighted in red. Click OK to continue.
- Tick Recoverable to save a restore set, or Non-Recoverable to prevent data restoration.
- For Recoverable, a Restore file is automatically created in the “CX-Pseudo_Restore” folder on the local drive.
- The original Set File is scrambled to randomly create pseudonym (substitute) values, which will also get saved into a recovery file for optional restoration later.
- Click Remediate to pseudonymize the names in the sheet. You can see the scrambled names are now in place.
- You can restore the original names by using the recovery file. Click the restore tab in the set file module. Navigate to that file in the CX-Pseudo_Restore folder, and click the Restore button.
This feature is also offered in Bulk Remediate mode. Using your .eif file, you can simultaneously protect all the set file items in all the discovered sheets (lookup values) in the same way with the set file remediation function you choose.