Scientific reasoning is a rare skill. I'm regularly met with bafflement and blank stares when I suggest that we measure something to determine performance, then perform the same workload, on the same hardware, after making a change.
I've literally had people try to make conclusions on comparisons of test runs with completely different parameters, different data sets, different resources, different versions of code, absolutely everything varying.
My head just explodes... I wamt to scream that this isn't how this works, it's not how any of this works.
Unfortunately I often see developers asking for complex diagnose in urgent emergencies. When it's literally like a 30 minutes reaction time frame.
So from business pov there should be first quick and dirty unperfect analysis which is necessary for emergency decisions.
Sometimes you can just prolong caching time as quick fix, but sometimes there is no such option.
Programmers like nice and calm working environment but business often is very different place. So, please have in mind what business goals are.
And yes, I see there is a lot of stupid managers in It and basically you're right, we should use precise data. Just one point: when available...
The first thing I'm likely to say to my team when an outage occurs is something along the lines of "stop the bleeding". That might mean bypassing the affected service if possible, rolling back a recent release, or reducing the amount of traffic (we're fortunate enough to have most of our traffic coming from sources we can throw a kill switch on).
However we go about it, the first priority is to give ourselves some space to properly analyse the issue and find the real solution without the rest of the business worrying loudly about things being broken.
Managing those experiences takes effort, otherwise you're sure to end up with the issues you describe.
Even when the data makes sense, sometimes people hand you over an unparsable csv with their results, while the conclusions comes from a gut feeling. Sadly not everybody understands standard tools like boxplots, and analyses are often hard to reproduce.
I think you have a very valid point there, but I would argue it is also important not to react to such situations too strongly.
This is something April Wensel talks about here [1] .
I've literally had people try to make conclusions on comparisons of test runs with completely different parameters, different data sets, different resources, different versions of code, absolutely everything varying.
My head just explodes... I wamt to scream that this isn't how this works, it's not how any of this works.