Test Data Virtualization

 

Next Steps
Overview Benchmarking Compliance DB Test Data DB Subsetting DevOps Test Files/Reports Virtual Test Data TDaaS

What is Test Data Virtualization?

Test data virtualization combines test data generation -- whether from data masking, subsetting or synthesis -- with efficient test data delivery. The operating word is efficiency, since most test data provisioning today still involves the creation of many physical copies of test tables or files rather than fewer, more consistent, up-to-date golden copies of test data that can be easily accessed and reset.

Manual, database-centric, and off-the-shelf approaches to test data virtualization have proven to be time-consuming and costly, resulting in inadequate testing or database-centric approaches that set back SLA/delivery dates. And testing with the latest, still unmasked production data as a fail safe is simply unsafe.

The ultimate goal is to provide dynamic, more or less on-demand (self-service), test data for database and software application development, systems testing, and outsourcing. The creation and management of virtual test environments containing safe, intelligent test data continues to remain a vexing part of QA and development cycles.

 

Solutions

By leveraging the long-proven synthetic test data generation and subsetting capabilities of the IRI RowGen tool -- or the FieldShield and DarkShield data masking tools also in the IRI Voracity data management platform -- you can satisfy multiple test data management requirements. You can also meet many of the test data provisioning requirements you have through virtual test environments, without the costs or complexity associated with commercial test data virtualization solutions.

Test tubes with a yellow liquid being dripped into them

One of inherent benefits of the IRI Voracity data management platform is its combination of robust data integration, test data generation, and data replication capabilities. Together they allow you to create and provision customized, virtual test data solutions quickly and easily for DevOps.

Voracity can combine static and streaming ETL, or real-time incremental database replication, with data masking, subsetting, synthesis, data transformation, and custom formatting. Without impacting live systems or being limited to a particular database or cloud platform, Voracity users can exploit and automate the capture, manipulation, and provisioning of both ad hoc (virtual) and persistent test sets that: reflect production data characteristics, preserve data and referential integrity enterprise-wide (not just database-wide), anonymize PII, and do not get stale.

 

Suggestions

First consider the business rules driving your need for an ad hoc solution. IRI provides advice on considering them in this series of test data management articles, and several facilities to help you discover the data you have to work with in sources like these; i.e., in files, databases, and dark data documents.

Next think about what kind of test data you need based on who needs it, and how and where it will be used. Your may need to be creative; some test targets benefit from a combination of data masking and synthesis like this. Or you may want to mask and thus produce realistic test data while:

  • subsetting it from a database environment like this, or replicating it like this
  • integrating SQL and file sources like this, or previewing ETL jobs like this
  • feeding it to a DevOps (CI/CD) pipeline for test automation like this
  • refreshing a virtual test database in real-time like this
  • streaming it from an IoT data broker, like this

Consider also that every Voracity test data generation process allows you to define multiple, differently formatted persistent and virtual targets simultaneously. Such efficiency and flexibility are especially valuable to DevOps teams who need to work in parallel.

Once techniques and targets are decided, you can also choose how to design the job(s), modify and/or share them, and how and where to run them. Voracity supports multiple job design and runtime methods; see the IRI Workbench section on this page

Further Advantages

Unlike other virtual TDM solutions, with IRI you do not need to clone databases, set up a virtual TDM appliance, or anything that complex (or expensive). Test data engineers can serve up as many persistent or virtual copies as they need, and immediately populate their testers' repositories as the test data is generated. However if you do want to a fully masked or synthetic database clone, IRI FieldShield and RowGen jobs can be run as scripts called simultaneously from Actifio, Commvault, and Windocks (virtualized container image) operations!

Robots in a testing assembly like working on acid CC BY-SA 2.5, Source

IRI subsetting, masking, and synthesis jobs for structured data are also supported in Cigniti and Value Lab TDM portals, which help you produce and govern on-demand test data sets for file, DB and API targets. For TDM involving semi-structured (e.g, HL7, JSON, and XML) and unstructured text or file (e.g., PDF, MS Office, and image data) sources, you can use IRI DarkShield to mask them, or replace real values in them with test data generated by IRI RowGen; see this article.

Finally, the governance of test data can be just as important as the governance of your production data. In addition to the inherent data security governance in Voracity's many static data masking functions, multiple data quality features allow you to validate and stabilize your test data collections, virtual or otherwise. Workflow diagrams and automated batch file generation support graphical design of independent and dependent work chains. And multiple data and metadata lineage options are supported so that you can track the changes to source data and your test data projects.

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.