Solving the ETL / Big Data Testing Challenge

Big Data has placed digital technology at the center of how modern organizations make business decisions and engage the marketplace. Enterprises are implementing digital transformation initiatives to drive internal and external efficiencies by processing massive amounts of information with the help of machine learning and artificial intelligence systems. A typical environment collects big data through an ETL process from a variety of digital sources into a data lake that in turn feeds a data warehouse along with operational data from enterprise applications.



Test Data Automation + Data Validation = Data Success

The volume and variety of big data makes the task of software testing and data validation a difficult one. Big data is a blend of structured and unstructured data captured in large batch files and also as real-time data streams.

Data collected from multiple sources can introduce redundancies, inconsistencies and inaccuracies. This threatens the quality of data and impacts the value of decision-making and business outcomes.

To solve the big data testing challenge, critical testing objectives must be met:

  • Validate the critical business rules and transformation logic being applied to the data
  • Test large volumes of data in a period of time that will not delay the release schedule
  • Pinpoint where data defects were introduced in the architecture and link them back to the specification or data model
  • Execute data tests as part of a Continuous Integration / Continuous Delivery pipeline

To meet this challenge, GenRocket has joined forces with QuerySurge to provide a solution that combines Test Data Automation and Data Validation to enable fast, easy and comprehensive testing of big data applications.


Using GenRocket’s Test Data Automation platform, your QA staff can generate any volume or variety of synthetic test data on-demand and in real-time. Massive test files containing any variation of structured or unstructured data can be quickly designed by testers and generated dynamically during test operations.

With the QuerySurge Data Validation solution, data sources and targets are compared to validate the ETL process is operating correctly. QuerySurge supports many additional of big data testing use cases as illustrated below.



GenRocket and QuerySurge Complete the Data Equation

GenRocket brings the ability to generate a high volume of controlled, real-time synthetic test data while QuerySurge has the ability to validate data and transformation logic as part of a continuous testing process with speed, accuracy and precision. Together they enable a big data testing process that can be integrated into your DevOps pipeline.

The integrated GenRocket/QuerySurge solutions enables testers to:

  • Accurately and rapidly generate the data needed for all testing scenarios
  • Mask production data to ensure data security
  • Validate large amounts of data quickly
  • Validate difficult transformation rules between the source and target systems
  • Integrate big data testing into your build pipeline to achieve continuous testing
  • Store historical results and provide analytics

The diagram illustrates this integrated big data testing solution


To learn more about the GenRocket and QuerySurge solution for big data testing, view the on-demand webinar embed below: Completing the Data Equation: Data Generation + Data Validation = Data Success.

For more information about QuerySurge, visit the company’s product page here. You can also request a live demonstration of GenRocket ‘s Test Data Automation solution to learn more about how it can address your specific software testing requirements.


Testing Data Lake Applications in Financial Services

A multi-national banking and financial services corporation required comprehensive test data automation for testing its data lake applications. They focused on synthetic test data generation because of the ability to produce highly controlled data variations in multiple data formats and its inherent data security. They were looking for a solution that would meet their needs for automated unit testing, exhaustive functional testing and performance testing.


Meeting the Test Data Challenge for AI and ML

A global services organization developing a new Artificial Intelligence application needed a large dataset of customer data with a given pattern of data to train the system.

By using various combinations of Linked Generators, the customer was able to design and model data sets for training and testing of an Artificial Intelligence / Machine Learning application.