What Is Test Data Generation And Why You Should Care by Gregg Bolinger on May 10, 2013

I’m going to go out on a limb and make a bold statement:  testing software is hard.  Okay, not that bold of a statement since we all know that.  No matter what language or platform, software has to be tested thoroughly.  Whether you’re delivering a product to a small company or trying to be the next Twitter, lack of good testing will cost you.  That cost might be disguised as time or customers, but at the end of the day, the cost hits someone’s bank account.

As Architects and Engineers, we solve problems.  Over the years, we’ve made great strides in solving the difficulty that is testing our software.  We have frameworks that allow us to more easily mock methods for unit testing.  We have in memory databases for integration tests.  We even have browser automation tools that make functional testing fun again.

What do all these testing methods need?  Test data.  Test Data Generation is the means by which some form of data is generated automatically by a system so that all of our tests have something to assert against.  Unfortunately, this concept is lost on most software shops.  Some of us are populating excel files with sheet after sheet of data to be read in by our tests.  Some of us are taking snapshots of the production database, hopefully scrubbing it of sensitive data, and loading it into a test database.   Talk to 5 different Software Engineers and you’ll get 5 different methods by which they get test data.  And for the most part, all these methods are flawed.

Here are 5 questions to ask yourself about your test data:

  • Does my test data take up gigabytes of space when unused?
  • Does my test data become stale when the rules change?
  • Do I write foreach loops and conditional statements to create my test data?
  • Am I the only one with access to the test data?
  • Do product releases create havoc for my test data?


Test Data Generation is most often an afterthought when it comes to testing.  Test data is just as important as any assertion in your test.  Without the proper test data, what are you testing?  If you answered yes to any of the questions above you should consider evaluating how you’re generating your test data.