DarkShield PII Discovery & Masking Charts

by Devon Kozenieski

As you may already know, IRI DarkShield produces search annotations and masking results with every operation. These log files can contain a large volume of data, including:

metadata about the file/database source(s)
what data was found as a match
which matcher matched the data
how the data was masked (if it was)
where matches occurred within each file or database.

In addition to reviewing or exporting these machine-readable logs to analytic platforms like DataDog or Splunk, DarkShield V5 users can now also customize and automatically create visually appealing, interactive, and informative HTML5 dashboard charts from the IRI Workbench GUI for DarkShield.

You can configure and create the DarkShield dashboard charts after running DarkShield in any of these modes (where different results become available accordingly):

Search Only (produces search annotations with PII location information)
Mask Only (produces masking results; a prior Search job is required)
Search and Mask (produces masking results)

After running any of the above jobs, right-click on your DarkShield job (.dsc) file in the applicable IRI Workbench project folder. Select the DarkShield Aggregate Results Report graph option. This will bring up a one-page wizard where you can specify custom parameters for the graph report:

DarkShield Aggregate Results Report Options

Colors can be mapped to specific silos in the bubble chart

The number of bubbles shown is also a configurable option; in this image, there is no restriction on the number of bubbles, so all silos display together. You can hover over each bubble for the data silo name.

You can limit the maximum number of entries for various components of the DarkShield aggregate results report. You can also configure the type and scale of some charts, as well as colors for the bubble chart.

After completion of that single-page wizard, a job will run to aggregate all of the search annotations and masking results within or under the same directory as the DarkShield job. Upon completion of the log aggregation job, an HTML file called {your-darkshield-jobname}-graph.html is produced in a folder named DarkShield Aggregate Results Report within the same folder as the DarkShield job.

Open the HTML dashboard file in an external, modern web browser such as Google Chrome or Firefox. The charts will not render in the internal Workbench (Eclipse) web browser because it lacks modern JavaScript or HTML5 features.

You may see up to these 5 component areas (charts):

1. The bubbles below rank the top 100 unprotected sources by the number of PII matches to DarkShield data class search criteria. Hover over each bubble to reveal the data source containing PII and the number of matches found.

2. The second area is a detailed table containing the name, data class, number of matches, and masked status for each data silo and data class combination. There are search bars above the table for filtering entries, and you can sort the table on any column next to each table header.

3. The third area is a table of failed (masking) results (if any). This area is not shown if there are no failed results.

The table includes a detailed identifier naming the silo, and a description of the failure. You can sort the table by clicking on its headers, and filter results by entering search terms in the search bars above for data silo name and reason for failure.

The search is based on whether an entry merely contains the search term entered; so, for example, .json could be used as a search term in the search bar labeled “Search for silo names…” to limit entries to JSON files.

Error search terms might be ‘encode’ or ‘empty’ where you can see the types of errors displayed in the main view to get ideas. The DarkShield manual lists the failure reasons.

A common failed reason for Parquet files is that a masked value does not meet the constraints of the Parquet file schema. For example, a number that has been redacted with asterisks is no longer a number and would not meet the schema’s constraints.

You can search for this type of entry by entering the word “constraints” into the search bar. You can remedy it by modifying your DarkShield job to have a masking rule paired with whatever data class matched the value of “Double” type to remain in the format of a double when masking is applied. In this case, format-preserving encryption would be a suitable option.

4. The fourth area is a pie, bar, or doughnut chart (depending on the configuration specified) showing the distribution of sensitive data silos based on the highest total number of matches. Data for this chart is aggregated from all types of sources; i.e.,

sources only searched
sources successfully searched and masked; and,
sources searched and masked but with failed masking results.

5. The fifth area is a pie, bar, or doughnut chart showing the distribution of sensitive data types (i.e. data classes) based on the highest total number of matches. Data for this chart is aggregated from all types of sources (i.e., the same as those listed above).

Known Limitations

The following known limitations should be kept in mind when utilizing the DarkShield Aggregate Results Report command:

Cassandra NoSQL DB annotations/results cannot yet be processed
Huge individual annotations/results files exceeding memory cannot be processed.
The annotations/results of embedded binary RDB BLOB columns are not aggregated.

Conclusion

The ability to generate this robust and customizable DarkShield dashboard gives you an easy, built-in method to quickly gain insight from the raw log data from search annotations and masking results. These charts allow you to quickly visualize several things, including:

The top data silos where sensitive data (matches) were found (via bubble chart), if the DarkShield job was run in search-only mode;
Both broad and granular views of masking status by data silo, from whole file repositories and RDB / NoSQL databases down to individual files, tables, or collections;
The distribution of sensitive data at a broad or granular (file, table, collection) level; and,
The reason for failed masking results (if any) along with the name of the associated resource.

If you have any questions about this dashboard or need help using it, please email darkshield@iri.com.

Masking Flat Files in the FieldShield GUI

IRI Voracity and Data Fabric