Re-ID Risk Scoring
To comply with the Health Insurance Portability and Accountability Act (HIPAA), you must de-identifyboth key identifiers and quasi-identifiers. Key identifiers are unique PHI values like name and Social Security Number, while quasi-identifiers are less unique attributes like age, race, state, gender, and occupation which can be used in tandem to identify someone.
The HIPAA Expert Determination Method rule requires that only a statistically very small chance of re-identifying a person exists in a data set. This affects HIPAA-covered business associates who wish to make use of this data and figure out what values they must modify.
Similarly, FERP Aregulations protecting student data privacy call these attributes indirect identifiers, which 34 CFR99.3(f) describes as "other information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person ... to identify the student with reasonable certainty."
The IRI FieldShield data masking product, and the IRI Voracity data management platform which includes FieldShield, include a graphical job wizard for statistically analyzing and scoring the re-ID risk based on quasi-identifiers in DB or delimited-file rows.
The Risk Scoring wizard in the IRI Workbench graphical IDE for FieldShield and Voracity produces detailed and visual reports that statistically measure the risk of re-identification. These reports score that risk in three modes of attack, and show the number of records in each equivalence class:
Another chart provides an interactive look at the different combinations of quasi-identifiers, along with their separation and distinction values to further assess their capacity to re-identify a record:
In addition to those interactive graphs, which can be saved in different image formats, FieldShield's re-ID risk determination report provides detailed descriptions of the metrics:
After reviewing the risk scoring report in consultation with a qualified statistician whom IRI can also refer, you can create additional FieldShield jobs that generalize or blur one or more of the quasi-identifiers so they remain useful for research or marketing purposes, but less likely to lead to re-identification. After that, you can easily re-score the modified data sets from the attribute model you created in the first pass through the wizard.
For more information on the wizard, see this article, or contact us through the form below.