GenRocket’s Approach to Synthetic Data Generation

by Louie Flores on Aug 11, 2021

Intense interest in synthetic data was initially triggered by the need for data variety and volume in software testing. Now it’s accelerating with new applications in ETL, Big Data analytics, IoT, and machine learning.

When synthetic data is used for software testing, there are two very different approaches:

  1. A synthetic data replica is produced by scanning and profiling a production data source
  2. Synthetic data is designed and generated dynamically based on test case requirements

The first approach involves scanning and profiling a production database to create a statistically representative synthetic data replica. This results in a secure and private synthetic version of production data that can safely be used for data analytics and business intelligence use cases.


Using a Synthetic Data Replica for Testing


However, statistically representative synthetic data is not suitable for software testing and quality assurance. That’s because a synthetic replica has the same limitations as the original production database from which it was derived. If data patterns, permutations and variations are missing from the production database, they will also be missing from the synthetic replica. This is just an updated version of the traditional TDM approach for provisioning test data.

With GenRocket, synthetic data is designed by test engineers and developers to meet their individual testing criteria using a self-service platform. Based on their test data design, synthetic data is generated on-demand and in real-time for each automated test run. This allows testers to have exactly the data they need, when they need it.