We've recently had an incident where misbehaving cluster of clients started fetching 4MB file from squid cache with ~1200 RPS (slowed down to 600 RPS later) which resulted in up to 2Gb/s of traffic sent to clients from each of our squid hosts and quickly overloaded squid.
I'm trying to use client_delay_pools to limit bandwidth per client and prevent misbehaving actors from saturating client-side network / CPU on squid hosts.
However I can't get it to work reliably. It seems to be working as expected for cache MISS, e.g. getting a speed limit of 10MB/s. But it's completely broken for cache HIT, speed I'm getting is ~5KB/s!