Masking PHI in FHIR EDI Files with DarkShield
Introduction
The IRI DarkShield data masking tool includes fit-for-purpose wizards in its IRI Workbench GUI to search (classify) and mask (remediate) sensitive ”dark data” (as defined by Gartner) in many semi, unstructured, and structured file, NoSQL, and relational database (RDB) sources.
This article focuses on a wizard that can discover and de-identify PHI in healthcare documents in FHIR (Fast Healthcare Interoperability Resources) format. See this article related to HL7 files, or this article related to X12 files if you need to find and mask PHI in those EDI formats.
Though not discussed in this article, DarkShield also includes wizards for other data sources, too, like the New File Search/Masking Job … wizard for sensitive data in many other file, document, and image formats; the New NoSQL DB Search/Masking Job… for popular NoSQL databases; and, the New Relational DB Search/Masking Job… wizard for RDB schemas.
What the DarkShield Healthcare Wizard Does
The “New Healthcare Search/Masking Job …” wizard in the IRI Workbench GUI for DarkShield creates a DarkShield Job (.dsc file) with the Search and Mask Contexts applicable to your job.
Search Contexts contain instructions on how to search for PII in its source. Mask Contexts contain instructions on masking the PII found during the search, and how to access the target where the masked version of data will be sent.
The searching and masking of data in FHIR documents is based on Data Classes you pre-define and store in an IRI Data Class and Rule Library, or create directly from this wizard. Each Data Class contains one or more search methods called Search Matchers to find PII/PHI values.
FHIR documents specifically use search matchers based on the location (path metadata) matchers, but you can also combine those with content-based data matchers as well to increase your search range (and time). For more information on the various search methods available, read about Location Matchers and Data Matchers.
While DarkShield previously supported FHIR documents, this wizard provides a user-friendly front-end to configure the jobs.
Prerequisites
You may need to update your DarkShield API and Workbench IDE; see this YouTube video for instructions, and email darkshield@iri.com if you need help or have any questions. Before launching the DarkShield Healthcare wizard, ensure these preliminary steps are completed:
1. First, verify that the DarkShield API distribution directory is specified in the IRI Workbench Preferences > IRI > DarkShield. From here, you can configure DarkShield GUI and API preferences, including the host, port, and directory holding the DarkShield API. If the DarkShield API distribution (Plankton) is empty or if you are using a different distribution, you will need to manually update the specified folder to reflect the location of the DarkShield API.
2. Second, all DarkShield Wizards require a project possessing an IRI Data Class and Rule Library. The library can be empty, as this wizard can create data classes and rules for you. Learn about data classes and rules from this article.
IRI Project Containing an IRI Data Class and Rule Library
3. Finally, verify that the Plankton (DarkShield API) server is running. The DarkShield API Status view (bottom of the screen) displays whether the API server is running.
Using the Wizard
To access the wizard, navigate to the DarkShield menu dropdown from the top toolbar menu in IRI Workbench and choose ‘New Healthcare Search/Masking Job…’.
This action opens the first page, where you will specify the source type, in this case, FHIR.
After selecting the source type, click Next. The next page will prompt you to specify the name and location of the DarkShield job you are building.
You also have the option to use previously created Data Classes or Data Class Groups. If you do, this streamlines the process of setting up the job without more steps because you will be using data classes that were defined or modified in the library or previous use of this wizard.
If this is your first time defining a DarkShield job for FHIR however, we recommend you to not use this option. That is because the next pages of the wizard will take you through the very specific types of FHIR paths you should classify.
After the setup page is the PII General and Advanced Selection Page. Here, predefined “groupings” 1 of common Personally Identifiable Information (PII) or Protected Health Information (PHI) types found within healthcare documents are presented.
It’s important to note that while these groupings cover many PII instances, they may not encompass all of them within a document. Therefore, you should carefully review all information during job creation to ensure the accuracy of your data discovery and de-identification results.
On this page, we can select these groupings for easier searching and masking of PII / PHI:
The “Advanced” section of the wizard gives you more granular control over specific path matchers. For this example, we will be using our pre-defined groupings of “Name”, “Phone Number”, and “Gender”.
The next wizard page is the ‘Rule Selection Wizard Page,’ where you associate the information selected in previous wizard pages with masking rules. Note that any information lacking a rule will be skipped during the search and masking process.
On this page, you’ll find a dropdown menu for selecting groupings defined in an earlier wizard, including information from the advanced selections. Once you’ve selected a grouping, you can apply a rule to it by selecting it from the “Select Rule” dropdown and clicking on ‘Add Group Rule.’
There is also an optional dropdown menu called Type. This permits the application of a rule solely to the selected type within the grouping. At the moment, JSON is the only type available, but future FHIR wizard versions will support the new XML format, too.
Within the table, you can also click on, and directly modify each rule cell via a dropdown menu to select a rule for a specific item. This provides even more granular control of the masking job.
The “Create Rule” button allows you to create a rule directly from this wizard. For more information about creating a rule, or what the masking rules are, refer to this article.
Since this wizard uses Data Classes, you can use any predefined masking rules in this wizard. Subsequently, any additional Data Classes can also be used in the .dsc file, combining manual entries along with these automated ones.
The last two wizard pages are used to define or modify the locations of your unmasked file sources and masked targets, respectively. You can specify local PC folders, LAN file system paths, or cloud buckets.
By default, each target file will have the same name as the source. If you specify more than one target, the masked files will be copied into every defined folder.
When you click Finish, a DarkShield job configuration (.dsc) file is created in the “Project Explorer” folder you specified on the first wizard page. New FHIR data class groups will also be created inside the “iriLibrary.dcrlib” in your project, which contains the data classes you just defined in the wizard.
Within these Data Classes, every detail can be altered without the need to revisit the wizard. This flexibility enables you to modify, add, or remove any Data Classes, significantly reducing setup time for future jobs, especially if requirements change.
Double-click on your .dsc file to see the ‘DarkShield Job Details Page’. Review the source, target, and data class information. It’s important to note that any changes made to items in the .dcrlib file require selecting the ‘Modify’ button and completing the wizard to ensure those changes are reflected in the .dsc file.
To run your job, right-click on the .dsc file and select the type of job you want. In this case, I am choosing a combined search and masking job.
Upon examining the sample FHIR source and target file below, we can see our net of groupings working its magic. With only a few clicks of our mouse, we were able to identify and mask common PII/PHI types.
One thing to mention is that process this works regardless of which document type is being used. Our groupings cover all available document types for FHIR, X12, and HL7, stemming directly from their documentation.
Another topic to consider is marking results, in this example, we only used Format-Preserving Encryption (FPE), however, many other data masking (and synthesis) functions are supported; see this page and its links for details.
FHIR SOURCE (Unmasked)
FHIR TARGET (Masked)
In addition to the masked files, an audit log file called results.json is generated. It details which PHI data classes matched on values in the FHIR file(s), and the corresponding rules used to mask them.
At the moment, JSON is the only supported FHIR format; in future updates XML will be supported as well.
It is also possible to generate aggregate visualization of the discovered PII and whether it was masked in built-in dashboard charts per this article. For help using this wizard to scan and/or mask FHIR data – or data in any other source(s) – send an email to darkshield@iri.com.