Define, Discover, and De-Identify PII in Dark Data
With IRI DarkShield, you can classify, find, and erase or otherwise mask sensitive information in multiple structured, semi-structured, and unstructured sources, including: text, PDF and MS Office documents, Parquet and image files, relational and NoSQL database collections.
DarkShield uses shared data classes, custom search combinations, and consistent masking functions across on-premise and cloud sources. With DarkShield, you can also extract, share, and display job results (and attendant file metadata) in its Eclipse, or your SIEM, environment.
With DarkShield, you can comply with Right to Be Forgotten requests, deliver specific data extracts to those requesting record portability, and facilitate data quality in data rectification requests. You can also save the masked files in either the original, or same-named target files and folders on your network or in the cloud.
Examine the functions and formats available in DarkShield on this site. Then, arrange a free online demo to see how DarkShield can work there, and to get answers to your questions.
How DarkShield Works
DarkShield leverages data classification dialogues and dark data discovery wizards in the free IRI Workbench IDE, built-on Eclipse™, to catalog the data you care about, and to configure search and masking specifications in metadata files that are easy to share, secure and modify. At runtime, the saved configurations can launch from Workbench or any application via CLI or RPC API.
You can run DarkShield just to search for hidden values and report on their locations and attendant file metadata. Or you can run it with remediation enabled, to obfuscate personally identifiable information (PII) for compliance with data privacy laws using a variety of masking functions. Your search and mask operations can run separately or simultaneously.
Large jobs can be load balanced through the API using an NGINX reverse proxy. Images can be pre-processed to improve scanning accuracy.
For optimal security and control, DarkShield runs on-premise by default (though it can be installed in containers or cloud VMs that you control -- we do not receive or host any data). You can also use NGINX to authenticate users and a key vault like Azure to manage encryption keys to differentiate data restoration access.
Click on the buttons below to learn more about each operation.
Search
Search multi-threaded through dark data repositories system- or LAN-wide (via SMB) -- and in Amazon S3, GCP, Azure BLOB and Sharepoint stores -- to ensure that data you're concerned about, or values you're specifically looking for, are found. Many other cloud, application, and proprietary platform connectors (e.g., Kafka, Facebook, Google, MINA, JPA, Sharepoint, etc.) are or can be supported.
Define your data classes and masking rules, and match them with six different search techniques:
- CSV, DB, JSON, XML, or Excel column/path filters
- RegEx pattern matching (with off-the shelf or customizable computational validation)
- Exact or fuzzy matches to values in dictionary / lookup (set) files
- ML-facilitated Named Entity recognition (NER) models using OpenNLP, Tensorflow, or PyTorch
- Bounding boxes drawn around fixed areas of images
- Facial detection and recognition (module on request)
You can reuse and share your data classes, search criteria, set files, and rule matchers in project or cloud repositories. And, because DarkShield runs in IRI Workbench alongside other IRI and Eclipse tools, you can do many other things with your DarkShield search results; see Extract next.
Extract
Generate the results of your search in a flat file that also contains forensically useful metadata attendant to each file containing the values you searched for. The search report can be used for e-discovery and delivery to EU citizens requesting "data portability" or for deletion proof where you are granting their "right to be forgotten" from these repositories.
If you license DarkShield as part of an IRI Voracity data management platform subscription, you can further manipulate and manage this data in ETL, analytic, and notification work flows, typically without seeing the PII result (shown optionally below):
Mask
Apply width-preserving or other static data masking functions, including format-preserving encryption:
- Format-preserving (or not) encryption
- Lookup pseudonymization
- Redaction / obfuscation
- String manipulation
- Randomization
- Bit scrambling
- Synthesis
- Encoding
- Deletion
- Hashing
- Blurring
to de-identify sensitive information and comply with data privacy laws. The files are visually identical to their unmasked counterparts, except for the masked strings. You can also write output to the same-named files in cloned directory trees to ease the reconciliation process.
Masking jobs are easy to modify and schedule. Subsequent search/mask operations will automatically cover new files in the source folders as well as those updated since the last search.
Audit
As DarkShield runs, it reports overall job status in a real-time progress bar. When each job completes, DarkShield generates a report of the values it found, along with the accompanying file metadata you wanted to see.
If you told DarkShield to mask, it will also report on the files that were masked, and those that were not completely masked. Of course all the search and masking job configuration details, including data classes and rule matchers, are saved and available for inspection locally or in secure repositories.
Easily query, analyze, and format the results of your search and mask operations through built-in reporting and visualization functionality. After DarkShield runs, right click on the results file to display information about the searching and masking operations. Where data could not be masked after an earlier search, you'll know, and can look at the DarkShield error log and data model to learn why and solve the problem.
Alternatively, you can forward or send DarkShield log data directly into:
- a SIEM/SOC tool (see the SplunkES example below) for custom display or alert requirements
- custom 2D reports from the data using the CoSort SortCL data mapping program in Voracity; DarkShield creates metadata for SortCL use in custom log query and reporting operations.
- Another cloud dashboard or KNIME -- both in the same Eclipse UI -- for BI or analytic needs, respectively.
What DarkShield Supports
File Formats
Text | Documents | Images |
---|---|---|
fixed & delimited flat | doc/x | bmp |
.eml & .html (scan only) | ppt/x | gif |
hl7, x12 & fhir | xls/x | jpg/x/2 |
json & xml | png | |
txt | .rtf (scan only) | tif/f |
log (various) | Parquet | DICOM |
Data Silos & Databases*
LAN, Related | Amazon | More Clouds/Apps | Additional Sources |
---|---|---|---|
Local & SMB | CloudWatch | Box & SalesForce | Couchbase, Redis, Solr |
FTP/HTTP/MINA | DynamoDB | Elasticsearch | Cassandra, CosmosDB, MongoDB |
Azure BLOB | RDS | Facebook & LinkedIn | Google BigTable & HBASE |
GCP Storage | Redshift | Google Apps | JDBC (RDBs) & JPA |
Sharepoint / OneDrive | S3 Buckets | jclouds | Kafka & MQTT |
DarkShield supports files accessible directly in local or SMB-compatible LAN systems, cloud-mounted drives like Dropbox and OneDrive, Google and Azure Cloud Storage and Amazon S3 buckets, plus RDB tables and NoSQL collections / clusters. However, the other connection protocols listed in italics above, along with several others, can be developed.
Please email darkshield@iri.com about your use case, or complete the information request form below.