Masking PHI in X12 EDI Files with DarkShield
Introduction
The IRI DarkShield data masking product includes fit-for-purpose wizards in the IRI Workbench IDE to search (classify) and mask (remediate) sensitive ”dark data” (as defined by Gartner) in many semi, unstructured, and structured file, NoSQL, and relational database (RDB) sources.
This article focuses on a wizard that can discover and de-identify PHI in healthcare documents in X12 electronic data interchange (EDI) format; see this article if you have HL7 files to mask.
Though not discussed in this article, DarkShield also includes wizards for other data sources, too, like the New File Search/Masking Job … wizard for sensitive data in files, the New NoSQL DB Search/Masking Job… wizard for sensitive data in various NoSQL databases, and the New Relational DB Search/Masking Job… wizard for sensitive data inside RDB schemas.
What the DarkShield Healthcare Wizard Does
The “New Healthcare Search/Masking Job …” wizard in the IRI Workbench GUI for DarkShield creates a DarkShield Job (.dsc file) with the Search and Mask Contexts applicable to your job.
Search Contexts contain instructions on how to search for PII in its source. Mask Contexts contain instructions on masking the PII found during the search, and how to access the target where the masked version of data will be sent.
The searching and masking of data in X12 documents is based on Data Classes you pre-define and store in an IRI Data Class and Rule Library, or create directly from this wizard. Each Data Class contains one or more search methods called Search Matchers to find PII/PHI values.
X12 documents specifically use search matchers based on the location (path metadata) matchers, but you can also combine those with content-based data matchers as well to increase your search range (and time). For more information on the various search methods available, read about Location Matchers and Data Matchers.
While DarkShield previously supported X12 documents, this wizard provides a user-friendly front-end to configure the jobs. DarkShield has other improvements within it, too, including support for sub-fields and repeating segments, topics we will delve into later in this article.
Prerequisites
You may need to update your DarkShield API and Workbench IDE; see this YouTube video for instructions, and email darkshield@iri.com if you need help or have any questions. Before launching the DarkShield Healthcare wizard, ensure these preliminary steps are completed:
- First, verify that the DarkShield API distribution directory is specified in the IRI Workbench Preferences > IRI > DarkShield. From here, you can configure DarkShield GUI and API preferences, including the host, port, and directory holding the DarkShield API. If the DarkShield API distribution (Plankton) is empty or if you are using a different distribution, you will need to manually update the specified folder to reflect the location of the DarkShield API.
- Second, all DarkShield Wizards require a project possessing an IRI Data Class and Rule Library. The library can be empty, as this wizard can create data classes and rules for you. Learn about data classes and rules from this article.
IRI Project Containing an IRI Data Class and Rule Library
- Finally, verify that the Plankton (DarkShield API) server is running. The DarkShield API Status view (bottom of the screen) displays whether the API server is running.
Using the Wizard
To access the wizard, navigate to the DarkShield menu dropdown from the top toolbar menu in IRI Workbench and choose ‘New Healthcare Search/Masking Job…’.
This action opens the first page, where you will specify the source type, in this case, X12. Wizards for a variety of common FHIR formats will be added here shortly.
After selecting the source type, click Next. The next page will prompt you to specify the name and location of the DarkShield job you are building.
You also have the option to use previously created Data Classes or Data Class Groups. If you do, this streamlines the process of setting up the job without more steps because you will be using data classes that were defined or modified in the library or previous use of this wizard.
If this is your first time defining a DarkShield job for X12 however, we recommend you to not use this option. That is because the next pages of the wizard will take you through the very specific types of X12 data segments you need to classify.
After the setup page is the PII General and Advanced Selection Page. Here, predefined “groupings” 1 of common Personally Identifiable Information (PII) or Protected Health Information (PHI) types found within healthcare documents are presented.
It’s important to note that while these groupings cover many PII instances, they may not encompass all of them within a document. Therefore, you should carefully review all information during job creation to ensure the accuracy of your data discovery and de-identification results.
On this page, we can select these groupings for easier searching and masking of PII / PHI:
The “Advanced” section of the wizard gives you more granular control over the segments, fields, and sub-fields to define for searching and masking purposes. For this example, we will be using both a general selection (Street Address) and a few advanced selections.
This next wizard page lets you filter out any unnecessary information from the general grouping selection made on the previous page. Filtering of this kind can improve job speed.
In this example, you’re presented with a list comprising the ‘Street Address’ general selection. If your documents don’t contain any segments from this list, there’s no need for filtering, as they will be automatically skipped.
However, if you prefer not to mask any of this information, you can simply deselect it from the table. For the sake of this example, we’ll keep all the information selected.
The next wizard page is the ‘Rule Selection Wizard Page,’ where you associate the information selected in previous wizard pages with masking rules. Note that any information lacking a rule will be skipped during the search and masking process.
On this page, you’ll find a dropdown menu for selecting groupings defined in an earlier wizard, including information from the advanced selections. Once you’ve selected a grouping, you can apply a rule to it by selecting it from the Select Rule menu and clicking on ‘Add Group Rule.’
There is also an optional dropdown menu labeled Segment. This permits the application of a rule solely to the selected segment within the grouping.
Within the table, you can also click on, and directly modify the rule cell using a dropdown menu to select a rule for a specific item. This provides even more granular control of the masking job.
The “Create Rule” button allows you to create a rule directly from this wizard. For more information about creating a rule, or what the masking rules are, refer to this article.
Note that if you also have HL7 documents and ran the DarkShield search/mask wizard covered in this article, the segments (and thus data classes) are different from X12; therefore they are not interchangeable. However, if you want to apply the same masking functions to similar groupings, you can at least refer to the saved data class rules from your HL7 wizard when selecting rules for your X12 grouping and segments.
The final option on this page is the “Repeating Segment #” column in the table. This is designed to be used with documents that contain duplicate segments. If there are duplicate segments, you may need to represent them within the table.
The “Repeating Segment #” defaults to the first occurrence of the segment. Below is an example of a document containing two N1 segments. If we wanted to only grab the second repeating segment, we would place a 2 inside the “Repeating Segment #” column.
The last two wizard pages are used to define or modify the locations of your unmasked file sources and masked targets, respectively. You can specify local PC or LAN file system paths or cloud buckets.
By default, each target file will have the same name as the source. If you specify more than one target, the masked files will be copied into every defined folder.
When you click Finish, a DarkShield job configuration (.dsc) file is created in the “Project Explorer” folder you specified on the first wizard page. A new X12 data class group will also be created inside the “iriLibrary.dcrlib” in your project, which contains the data classes you just defined in the wizard.
Within these Data Classes, every detail can be altered without the need to revisit the wizard. This flexibility enables you to modify, add, or remove any Data Classes, significantly reducing setup time for future jobs, especially if requirements change.
Double-click on your .dsc file to see the ‘DarkShield Job Details Page’. Review the source, target, and data class information. It’s important to note that any changes made to items in the .dcrlib file require selecting the ‘Modify’ button and completing the wizard to ensure those changes are reflected in the .dsc file.
To run your job, right-click on the .dsc file and select the type of job you want. In this case, I am choosing a combined search and masking job.
Upon examining the sample X12 source and target file below, we can see the PII masked in various ways, including pseudonymization and format-preserving encryption (FPE) for realism. Many masking (and synthesis) functions are supported; see this page and its links for details.
In the example below we chose to encrypt the street address, pseudonymize the name, and redact the phone number. Note there are other PHI elements still visible in the document; we left them alone to showcase the granularity and control you have over masking this document.
Sample X12 Source File:
Masked Target:
In addition to the masked files, an audit log file called results.json is generated. It details which data classes matched on information in the X12 file(s). and the corresponding rules used to mask it.
It is also possible to generate aggregate visualization of the discovered PII and whether it was masked in built-in dashboard charts per this article. For help using this wizard to scan and/or mask X12 data – or data in any other source(s) – send an email to darkshield@iri.com.