How a Single Bug Can Trigger a Massive Outageby Louie Flores on Jun 16, 2021
The Guardian recently reported a massive Internet outage affecting many prominent websites including Amazon, CNN, Hulu, The New York Times as well as their own online news site. The outage lasted for 1 hour, taking down major websites, and leaving millions of online visitors with nothing more than an obscure message: “Error 503 service unavailable”. For major online retailers like Amazon, the financial impact is measured in thousands of dollars in lost business for each second of downtime.
Quality assurance professionals will recognize this as a combinatorial testing problem. When testing software with different combinations and permutations of data values, missing just one of them can have unforeseen consequences.
The best practice is to test all combinations and permutations by injecting both valid and invalid data in all combinations to identify errors in the code before it’s released to production. In this article we describe how the use of real-time synthetic test data could have prevented this massive internet outage.