Shifting Dates While Preserving Intervals
One of the data masking requirements for IRI FieldShield that we see in PHI anonymization use cases involves the blurring of dates at the row level, instead of the column level while providing a configurable option to retain the interval between those dates. This is typically needed in clinical research or test data scenarios which must maintain specific intervals between hospital admittance or discharge, or treatment start, and end, dates.
Let’s say for example, that a table or file in production contains a ‘StartDate’ and ‘EndDate’ column. The values of both dates need to be anonymized but shifted by the same constant number of days, so that the duration of the event does not change. For each record, a new random number within the desired range can be used.
This image illustrates the requirement:
with the procedure expected to be as follows:
- User selects the input data source (table or file)
- User selects the columns which define the beginning and ending of the interval
- User provides the range of the shift factor (e.g. -10 to +10)
- Software chooses random number within the shift
- Each row has the beginning and ending values shifted by the random amount, preserving the interval
FieldShield scripting handles this with a separately generated random number offset within a given range. One nice feature of this approach is that with our date and time based math functions, the offset can be defined as seconds, minutes, hours, days, weeks, months, or years. The offset can be applied to any time based data type, such as a date, time, or timestamp.
A FieldShield job script to perform the row-based shifting and masking of these dates follows:
In our example, the two dates in each row are shifted by the same random number of days, between plus and minus 10 days.
Notice the virtual field defined in the virtual input record (/INREC) section of the script, called RAND_SHIFT. For each row processed, it randomly generates a new date shifting value from ten fewer to ten greater days.
In the resulting target file, the new start and stop date values are adjusted by the RAND_SHIFT value in days by the “change_dt” function to make sure a proper date is used; see this article.
Here is the output data:
This example processes a CSV file, but the same principle applies to a database table, or any other structured data source supported by the FieldShield data processing engine, SortCL. And of course, different masking functions could be applied to other fields in the same job script; for example, pseudonymization of the names, and/or blurring of the age values.
Below is the same job designed in the free graphical IDE for FieldShield, IRI Workbench. The screenshot shows the diagram and outline views of the job script, and the tab-delimited output:
If you have any questions about this article, or need help implementing a static or dynamic masking solution for privacy law compliance, test data management, or breach risk mitigation, please contact your IRI representative.