A Synthetic Test Data Approach to Database Migration Testing by Stephen Chisa on Oct 02, 2017
A Synthetic Test Data Approach to Database Migration TestingFor numerous software applications that have a successful shelf life over many years and many version releases, there may come a time when major changes in the application’s database relational model are warranted. Major changes to the relational database model normally necessitate the arduous task of migrating legacy data from tables and columns within the old relational database model to tables and columns in the new relational model.
Legacy Data Migration StepsThere may be many steps to performing a legacy data migration, but the underlying steps are the following:
- Define what tables from the old relational model need to be migrated
- For each table:
- Define what table or tables it is to be migrated to
- Map each column from the old table to a column in a new table
- Define the sequence in which each table should be migrated to ensure data referential integrity during the migration
- Test, test and test again that the migration works before executing actual production legacy migration to the new relational database model
Database Migration TestingThe fourth step, TESTING, is the most crucial step in the process towards guaranteeing to the success of a legacy migration; for without adequate and accurate testing of a resulting migration, it is nearly impossible to validate that every column from every row in every table migrated was successfully saved to a table, row and column in the new relational model. If even one column in one row of data is migrated incorrectly, the negative effects could be minor to catastrophic.
The Challenges of Testing with Production DataCurrently, most migration testing is accomplished with all or a subset of data from the actual production legacy database. Using production data to test a legacy migration comes with all sorts of challenges. Three core challenges using production data are the following:
- Production data is subject to how the data was entered by users in the production environment and thus does not lend itself to clear patterns of data that can be easily quantified and tested. With the exception of automated processes that act upon production data to create new production data (e.g. annual payment schedule, weekly calendar, etc.), production data is by nature relational in structure but random in pattern unless enough user data is entered consistently to establish a pattern.
- Production data contains sensitive data that must be masked and pruned before the data can be used in the testing environment; however, the pruning of sensitive data could affect the outcome of algorithms that act upon the data because the value of the data has been modified.
- Production data may require quite a bit of query analysis in order to find subsets of data that contain a consistent enough set of data to establish a pattern of data that can be used to test and validate the migration process.
The GenRocket Approach to Test DataWith GenRocket’s data modeling and synthetic test data generation platform, the challenges that come with using production data to test the migration process can be eliminated and approached in completely new way. It goes to the approach of how GenRocket allows you to manage test data:
- GenRocket allows you to define Domains that model your production database tables.
- Domains contain Attributes that define the characteristics/columns of your database tables.
- Attributes are assigned GenRocket Generators that generate different types of test data to mimic the data in your production database.
- GenRocket Generators can even be linked together to generate complex but consistent patterns of test data.
The Benefits of Testing Database Migration with GenRocketWhat does this new approach to test data mean for testing a migration for legacy relational data models?
- You can fully model legacy tables as GenRocket Domains.
- You can fully control the patterns of data that is generated for each column in a legacy table be it, primaryId’s, date ranges, predictable and fixed calculations, boolean conditions and even controlling the percentage of data generated over given ranges of data to produce predictable statistics.
- You can fully control how much data is generated for a given test, from one row of data in the first test, to an ever-increasing quantity of data with each subsequent test.
Also published on Medium.