Looks like this was a false alarm. The test environment I was using had some random fluctuations in the results. I assumed they averaged out over multiple test runs. However it looks as if I got some bad rolls of the dice and they did not.
I have now a modified test which has <1% variation over four runs each taking 3 hours. Testing both versions, I now see negligible difference between them.