Which IRI Data Masking Tool is Best for Excel?
Abstract: Now that XLS and XLSX files are now also directly supported in IRI FieldShield by virtue of their new /PROCESS (format) support in the CoSort 10.5 SortCL program, IRI Voracity data management platform and standalone data masking users have three different ways to “shield” sensitive data in their spreadsheets. This article was written to help you choose among them.
Do you know the difference between IRI FieldShield, DarkShield, and CellShield EE? All three of these IRI Data Protector Suite products share common search methods (like pattern or dictionary matches) and individually or globally applicable data masking functions (like encryption, redaction, hashing, and pseudonymization).
All three can thus produce identical, and potentially interchangeably reversible, ciphertext to preserve determinism and data integrity. They are also all supported components of the IRI Voracity data management platform.
Each can also find and mask Personally Identifiable / Protected Health Information (PII/PHI), and other sensitive data in MS Excel sheets. But the tools differ in Excel operation and job configuration by design, so it’s important to know which one(s) to use in which situation(s).
FieldShield for Excel
IRI FieldShield is designed to classify, search, and mask structured database and flat-file data. FieldShield uses the CoSort SortCL data definition and manipulation program to define source and target layouts, and all mapping / masking transformations. SortCL uses 4GL job scripts usually created and managed in the IRI Workbench IDE, built on Eclipse.
Searches for PII in Excel can be performed more or less globally through the Directory Data Class Search wizard in IRI Workbench. These modules allow you to recursively search all XLS and XLSX files in and below your specified Windows drive/folder.
Only the top sheet in each file is scanned by default, but specific sheets and ranges can be added individually as additional file sources in the data class library editor. This dialog is also where the global masking functions/rules can be defined, verified, or modified.
At masking time, either globally-associated functions are applied as rules to many sheets at once in batch jobs built by the Data Class File Masking wizard, or individual functions can be specified in one or more job scripts ad hoc (one per sheet). The real benefit of using FieldShield for masking data from/to Excel sheets however is the inherent mapping ability SortCL imbues.
By way of example, this FieldShield data masking script takes PII from a text tab separated file, writes an encrypted/pseudonymized version to one .xlsx file, and an unmodified version to another .xlsx file. The encrypted passwords obscure encryption keys in the job script.
/INFILE=transactions.txt /FIELD=(credit_card,POSITION=1,SEPARATOR='\t') /FIELD=(driv_lic,POSITION=2,SEPARATOR='\t') /FIELD=(name,POSITION=3,SEPARATOR='\t') /REPORT # Voracity can also sort and/or join, plus aggregate, convert, etc. below /OUTFILE=”'Sheet1'!MMM1:MMO10;transactions_PII_encrypted.xlsx” /PROCESS=XLSX /FIELD=(credit_card1=enc_fp_aes256_alphanum(credit_card,"epass:Gg87jCTR15Jro4AGE44ENw=="),POSITION=1,SEPARATOR='\t',SDEF="=MMM") /FIELD=(driv_lic1=enc_fp_aes256_alphanum(driv_lic,"epass:Gg87jCTR15Jro4AGE44ENw=="),POSITION=2,SEPARATOR='\t',SDEF="=MMN") /FIELD=(name,POSITION=3,SEPARATOR='\t',SDEF="=MMO",SET=C:\IRI\cosort100\sets\names\names_first_last.set) /OUTFILE=”'Sheet1'!M1:O10;transactions_PII_unencrypted.xlsx” /PROCESS=XLSX /FIELD=(credit_card,POSITION=1,SEPARATOR='\t',SDEF="=M") /FIELD=(driv_lic,POSITION=2,SEPARATOR='\t',SDEF="=N") /FIELD=(name,POSITION=3,SEPARATOR='\t',SDEF="=O")
IRI Workbench also automates the generation of the /FIELD layouts as well as reusable Data Definition Format (DDF) files that specify the metadata of any source (and/or target) sheet. SortCL requires statements in DDF metadata syntax to parse, mask, and map the input/output values column-by-column.
FieldShield jobs can be executed on the command line, in batch jobs, or scheduled from IRI Workbench or any automation tool. Because the SortCL syntax is capable of much more than masking, it’s also a best choice for Voracity users looking to use sheet data in ETL processes, generate and populate (RowGen) test data into sheets, cleanse and report on data in Excel, etc.
DarkShield for Excel
IRI DarkShield is designed primarily for finding and masking data in semi-structured and unstructured text and EDI files, RDB C/BLOB columns, NoSQL DB collections, images, and documents in PDF and MS Office formats — including XLS and XLSX-compatible files which may even have PII in charts and images in the sheets.
DarkShield jobs can be specified in the Dark Data Search/Masking Job wizard in IRI Workbench… or through its RPC API for files, which also supports Excel. The DarkShield API is indicated instead of Workbench when the number of PII annotations will exceed the capacity of Eclipse memory, or when XLS/X searching/masking jobs need to be called from a web service or program.
The main reason to use DarkShield is for pure masking, but not mapping, of data in Excel, or when spreadsheets are among several different mixed sources to protect. Unlike FieldShield, DarkShield does not support simultaneous data transformations, but either DarkShield paradigm can find and mask sensitive data in Excel (and other formats) simultaneously.
In fact, among all three IRI masking products, DarkShield has the most search criteria for finding data in cells. Beyond column names and pattern and dictionary matches, the DarkShield API supports fuzzy matches to lookup values, Named Entity Recognition, and area bounding boxes. It is also the only tool that can find and mask PII embedded in spreadsheet charts and images.
DarkShield is also a good choice for integrating spreadsheet masking into document management or DevOps (CI/CD) pipelines because of the familiarity and flexibility of its OpenAPI. And through “glue code” written in any language, it also is possible to customize the operational connections and destinations for the masked sheets or data extracts you may need.
It provides search results and job logs in a variety of internal formats and external tools as well.
CellShield for Excel
IRI CellShield was purpose-built for finding and masking data only in spreadsheets, and is the only IRI “shield” product that can work entirely in the Excel environment alone. CellShield is available in a low-cost Personal Edition (PE) or professional multi-sheet Enterprise Edition (EE).
CellShield PE is useful for manually searching and masking PII one sheet at a time. It features a point-and-click interface to the same encryption, redaction, and hashing functions supported in the other IRI shield products, except pseudonymization which is random and not deterministic.
CellShield EE was designed to consolidate searching and apply consistent masking operations to PII in multiple XLS and XLSX sheets. In addition to PE features, EE supports network-wide and data class-based searches from IRI Workbench (or pattern searches from the Excel UI), intra-cell searching and masking to handle values in text fields, and full-sheet masking.
CellShield EE can also display search results in charts or by exporting its results.
The same is true for its job audit trails. Logs can be sent to any location in the file system, through email, or directly into Datadog or Splunk indexes for further analytics and visualizations.
Quick Decision Matrix
If you have sensitive data in Excel sheets and the above product details are insufficiently clear, consult this feature matrix comparing what is and is not supported in the IRI masking tools:
If you still have questions about which Excel data masking tool would work best for your use case, or would like to see one or more demonstrated, please contact your IRI representative.