Data Risk Mitigation via Data Masking

by Jeff Simpson

Data at Rest is Data at Risk. Mitigate the Risk through Data-Centric Security

Data Risk Mitigation … the need for it is on the rise in the United States and around the globe. Think of this example. You are at home opening your mail and you have a shiny new credit card from your credit card company. There is no real information other than “your information might have been at risk, and to prevent theft, we have issued you a new card”.

For the last several years, the theft of personally identifiable information (PII) has been on the rise. More than one in four Americans have had their personal information lost or stolen. It is not only individuals who are at risk. Since 2005, the Privacy Rights Clearinghouse has chronicled reported breaches of client, patient, and employee data (including credit card numbers, social security numbers, birth dates, etc.), intellectual property, and other important records exposed through loss, theft, hacking, etc. This is why Data Risk Mitigation is a crucial consideration in a company’s business planning efforts.

Consider the following cases (one out of MANY per year) where data has been compromised, and how they might relate to you or your company:

In 2014, of the 331 data breaches reported, six exceeded 10 million (m) records. The largest was eBay, which had more than 145m user emails, passwords, DOBs and addresses hacked from a database.
In 2015, personal details of 191m US voters was found on a publicly available database, 15m T-Mobile customer credit check records were exposed, hackers stole more than 10m records from Sony Pictures, and 37 records were stolen from Ashley Madison’s site.
In 2016, 1.5b login records were reported stolen from Yahoo in 2 prior incidents, 412m at Friend Finder, 360m at MySpace, 43.4m from Weebly, 32m at Twitter, and 22.5m from Foursquare.
In 2017, a Deep Root Analytics cloud database of more than 198m user voters was found unprotected, River City Media inadvertently exposed 1.37b email addresses and other data in a backup archive.
In 2018, 1.1 billion Indian residents’ PII and biometric was exposed when a government portal had a leak. Information on 340m people was vulnerable in an Exactis public server, and 150m MyFitnessPal app user details were hacked. That was also the year of similar embarrassments at Facebook/Cambridge Analytica, GooglePlus, Cathay Pacific, T-Mobile and Marriott.
In 2019, a hacking forum shared access to a cloud database of, ironically, 773m already-breached emails addresses and 22m unique passwords. A Down Jones watchlist database exposed 2.4m identity records of international politicians and government officials.

Source: https://www.privacyrights.org/data-breach

These are just a few examples illustrating why it is imperative to protect sensitive data where it resides. Basic security practices should be followed to ensure the protection of data at multiple points of entry, control, and exit. Indeed, companies must guarantee that their information systems are not an open target, and they must protect PII in appropriate ways throughout its life cycle. This means exercising a combination of people, process, and procedural measures that leverage technologies for both endpoint and what IRI calls “startpoint security.”

It is the data-centric starting point protection (a/k/a data masking) requirements that prompted IRI to develop functionality for finding and de-identifying PII in files and databases. For this reason, IRI offers FieldShield to find and protect data at risk down to the field level in tables and flat files. IRI subsequently developed CellShield to find, classify, and mask PII in multiple Excel spreadsheets at once, and DarkShield to do the same in unstructured text, document, and image files.

FieldShield, CellShield and DarkShield offer users a choice – for each item of PII (or data class) – of AES, GPG, or other encryption libraries, data redaction (e.g. rendering a credit card number unreadable except the last 4 digits) and de-identification (e.g. separating or pseudonymizing sensitive information in medical records), hashing and so on … up to 14 different functional categories of protection in the case of FieldShield.

These functions can be applied to fields in multiple data sources through automatic wizard-driven workflows, and can also be seamlessly invoked within data warehousing, data/DB migration, MDM, and reporting/analytic data preparation operations in the IRI Voracity data management platform. Granular data searching and classification wizards, field-level security functions, re-ID risk determination reporting, and automatic XML job (audit) logs, help organizations mitigate data risk, comply with internal and government privacy regulation, and provide safe and realistic test data for DevOps and more.

Encryption Key Management and Why You Should Care

Examining the LDIF File Format