I heard a great interview with performance tester, Scott Barber. Two things Scott said stayed with me. Here is the first.
Automated checks that record a time span (e.g., existing automated checks hijacked to become performance tests) may not need to result in Pass/Fail, as respect to performance. Instead, they could just collect their time span result as data points. These data points can help identify patterns:
- Maybe the time span increases by 2 seconds after each new build.
- Maybe the time span increases by 2 seconds after each test run on the same build.
- Maybe the time span unexpectedly decreases after a build.
- etc.
My System 1 thinking tells me to add a performance threshold that resolves automated checks to a mere Pass/Fail. Had I done that, I would have missed the full story, as Facebook did.
Rumor has it, Facebook had a significant production performance bug that resulted from reliance on a performance test that didn’t report performance increases. It was supposed to Fail if the performance dropped.
At any rate, I can certainly see the advantage of dropping Pass/Fail in some cases and forcing yourself to analyze collected data points instead.
Somewhat similar, in a recent project I used some existing automation as a kind of stability test. Rather than looking for a single pass/fail I was monitoring for changes in aggregated test result data over time: Migrate Idea
As you say in your post, I don't think this would work everywhere, but within the constraints I had it was a useful technique.
Nice little post here. However, I'd suggest clarifying the title a little: this isn't really about non-determinism, but rather non-discreteness. You're talking about how a discrete result scheme (pass/fail) may not tell the whole picture (especially in the realm of perf testing), not that the result scheme is suspected to be non-reproducible within a repeated fashion, no?
Although, there may be some overlap between the two concepts, which I suspect is the reason why the title was named as it is. :-)