Synthetic Training Data from GenRocket Enables Highly Accurate Machine Learning Models for Anomaly Detection

by admin on Mar 16, 2023

OJAI, CA – Anomaly detection is one of the fastest growing AI-assisted applications being deployed by organizations across the globe in areas like:

  • Fraud detection (bank/credit transactions and insurance claims)
  • Cybersecurity (intrusion detection for networks and systems)
  • Medical informatics (diagnosis and disorder detection)
  • Fault detection (many commercial and industrial use cases)

According to a recent study by Allied Market Research, the global fraud detection & prevention market alone was valued at $29.80 billion in 2021, and is projected to reach $241.23 billion by 2031, growing at a CAGR of 23.8% from 2022 to 2031.

Anomaly detection is the process of identifying patterns, behaviors, and data points that deviate significantly from normal conditions or expected results. Highly accurate anomaly detection algorithms are required to correctly identify outliers without registering false positive or false negative alarms. This is a tough challenge for machine learning models, but one that can be overcome when properly trained and tested with massive volumes of precisely controlled and conditioned synthetic training data.

GenRocket has developed new intelligent data generators for its Test Data Automation platform designed to meet the complex data generation requirements of machine learning. The new data generators are able to produce controlled and conditioned synthetic training data for simulating highly complex statistical data distribution profiles. This allows AI-assisted application developers to define the rules and statistical profiles needed to accurately train the algorithms in an anomaly detection system. GenRocket makes it possible to generate billions of rows of synthetic training data with precise control over the volume, variety (patterns, permutations, positive and negative data) and the output format required to accurately train machine learning models.

“Existing synthetic data tools build training data by discovering the statistical distribution profiles of a production database in order to generate a synthetic replica,” commented GenRocket CEO and Co-founder Garth Rose. “However, a common problem with a synthetic replica is it reproduces any biases or data quality issues (e.g., corrupt, inconsistent, or missing data) and almost always underrepresents the anomalies that can represent fraudulent behavior.”

One of the first applications of GenRocket’s synthetic training data solution was deployed by one of the industry’s largest software companies. They wanted to help governments in more than 160 countries detect tax fraud through the use artificial intelligence technology for a sophisticated tax fraud detection system. Because this was a “greenfield” application, there was no production database to copy, so they needed a synthetic data generation platform that could generate machine learning training data from a detailed, written specification. And they needed a platform that offered precise control over the data generation profile so that highly statistically accurate training data could be produced in huge quantities.

GenRocket’s Test Data Automation platform was used to generate synthetic training data containing 990,000 transactions for 1,440,090 organizations to cover the tax provisions for all supply chain scenarios. The solution demonstrates how GenRocket’s rules-based synthetic training data can be an extremely effective choice for anomaly detection, not only for fraud detection, but for a wide range of anomaly detection applications in financial services, healthcare, cybersecurity, manufacturing, and many others. To learn more about the fraud detection system described about, read the case study.

About GenRocket

GenRocket is the leader in synthetic Test Data Automation (TDA), high-performance
technology for automating the design of synthetic data for software testing and machine learning requirements. GenRocket’s patented, groundbreaking platform accelerates test data provisioning by more than 1,000% as it improves data quality and test coverage while reducing cost and ensuring data privacy. Headquartered in Ojai, California, GenRocket operates in global markets through a network of systems integration partners and has customers in more than 14 vertical markets including financial services, insurance, and healthcare.

For more information, visit or follow us on LinkedIn, Twitter and Facebook.

Request a Demo

See how GenRocket can solve your toughest test data challenge with quality synthetic data by-design and on-demand