What is Data Security Governance?
The Latest Data Risk Landscape
Enterprise data continues to change rapidly in form, size, use, and residence. Rarely does it remain in siloed constructs anymore, limited to certain business units or untouched by the outside world. Data now freely crosses the prior conceived thresholds that limit business potential. It floats about in the cloud, spreads between business units, and flows everywhere.
But for all the change and opportunity that data represents, once it’s created or collected, it is under threat of attack and misuse. With the number of reported data breaches doubling in the last ten years, and half a billion records exposed last year, our reliance on information is under increasing threat from a lack of security.
With the exposure of personal data at industrial scale, the growth of data privacy legislation was inevitable. Companies and government agencies collecting and handling personally identifiable information (PII) must now comply with Payment Card Industry Data Security Standard (PCI DSS) and Health Insurance Portability and Accountability Act (HIPAA) requirements in the United States, the General Data Protection Regulation (GDPR) in Europe, and many international and local follow-on laws like POPI in South Africa, the DPDP Act in India, KVKK in Turkey, and the California Consumer Privacy Act (CCPA).
Data breaches also carry explicit costs. A recent Ponemon Institute study found that the average cost of a data breach was US$4.45 million in 2023, and the risk of having 10,000 stolen or lost records is ~26%. With more than a one-in-four chance of losing 10,000 records, would you take that risk if you could use technology to prevent it?
Organizations stuck in old operational models and mindsets fail to recognize the importance of company-wide security protocols. To improve, they must address their need for what Gartner calls Data Security Governance and thus protect information in structured and coordinated events, not as an afterthought or remediation after a breach.
What is Data Security Governance?
Gartner defines data security governance (DSG) as “a subset of information governance that deals specifically with protecting corporate data (in both structured database and unstructured file-based forms) through defined data policies and processes.”
You define the policies. You define the processes. There is no one-size-fits-all solution to DSG. Furthermore, there is no single product that meets all of the needs of DSG. You must look at your data and weigh which areas have the greatest need and the most importance to your company. You take data governance into your own hands to avert disaster. Remember that your information is your responsibility.
While there are multiple pathways to safeguarding data — logical, physical, and human — three primary software methods that IRI customers successfully employ are the classification, discovery, and de-identification (masking) of PII and other data considered sensitive.
Data Classification
In order to find and protect specific data at risk, it must first be defined in named categories or groups. Data so classified can be cataloged not only by its name and attributes (e.g., US SSN, 9 numbers), but also subject to computational validation (to distinguish it from other 9-digit strings), and sensitivity attribution (secret, sensitive, etc.).
In addition to those assignments, data classes or class groups can be characterized by where they are located and/or how they should be found (search method/s) if their locations are unknown. Also possible is the global assignment of a remediation, or masking function, so that de-identification can be carried out consistently for all members of the class, regardless of location, preserving its referential integrity.
Data Discovery
To find sensitive data, search functions that may or may not be associated with data classes can be executed. Examples of discovery techniques include:
- RegEx or Perl Compatible Regular Expression (PCRE) matches
- exactly or fuzzy matching to lookup values
- special path or column filtering logic
- named entity recognition (NER)
- signature and facial recognition, etc.
It is also possible to leverage machine learning in the recognition process. IRI supports semi-supervised machine learning in NER model building, for example, in its DarkShield product (below).
Data De-Identification
One of the ways to reduce, and even nullify, the risk of data breaches is by masking data at rest or in motion, with field-level functions that render it protected but still usable to some extent.
According to Gartner analyst Marc Meunier, “How Data Masking Is Evolving to Protect Data from Insiders and Outsiders:”
Adopting data masking helps organizations raise the level of security and privacy assurance for their sensitive data — be it protected health information (PHI), personally identifiable information (PII) or intellectual property (IP). At the same time, data masking helps meet compliance requirements with security and privacy standards and regulations.
Most enterprises — either by virtue of internal rules or data privacy laws — have been, are now, or will soon be, making data masking a core element of their overall security strategy.
Proven Software Solutions
IRI provides static and dynamic data masking solutions for structured data in:
- relational databases
- flat files
- Excel
- ASN.1 CDR files
- proprietary mainframe and legacy application sources
- big data and cloud platforms (Hadoop, AWS, etc.)
through its FieldShield product or Voracity platform, as well as data at risk in Excel via CellShield.
For PII and other data at risk in those structured sources, as well as in semi-structured and unstructured data sources like:
- NoSQL databases
- free-form text files and application logs
- EDI files (HL7, X12, FHIR, JSON, and XML)
- Parquet files
- MS Office (Word, PowerPoint and Excel)
- .PDF documents
- Image files (DICOM, BMP, GIF, JPG, PNG, TIF)
you would use the IRI DarkShield product to classify, discover, and de-identify it.
In these data masking tools, you can consistently apply (to preserve data and referential integrity) data masking methods like:
- blurring
- deletion
- encryption
- redaction
- pseudonymization
- hashing
- scrambling
- tokenization
with or without the ability to reverse certain functions. Voracity — which includes those products — also folds data masking into data integration, data quality, and data migration operations, as well as data federation (virtualization), reporting and data wrangling for analytic operations.
Built-in data discovery, classification, metadata management, and audit logging features facilitate both automatic and manual assessments of the re-identifiability of affected records. See www.iri.com/solutions/data-masking and www.iri.com/solutions/data-governance for more information, and contact your IRI representative if you need help creating or enforcing your DSG framework through a data-centric, or ‘startpoint’ security approach.