High memory usage associated with ssl_bump and broken clients
I've identified a problem with Squid 3.5.26 using a lot of memory when
some broken clients are on the network. Strictly speaking this isn't
really Squid's fault, but it is a denial of service mechanism so I
wonder if Squid can help mitigate it.
The situation is this:
Squid is set up as a transparent proxy performing SSL bumping.
A client makes an HTTPS connection, which Squid intercepts. The client
sends a TLS client handshake and squid responds with a handshake and the
bumped certificate. The client doesn't like the bumped certificate, but
rather than cleanly aborting the TLS session and then sending a TCP FIN,
it just tears down the connection with a TCP RST packet.
Ordinarily, Squid's side of the connection would be torn down in
response to the RST, so there would be no problem. But unfortunately,
under high network loads the RST packet sometimes gets dropped and as
far as Squid is concerned the connection never gets closed.
The busted clients I'm seeing the most problems with retry the
connection immediately rather than waiting for a retry timer.
1. A connection that hasn't completed the TLS handshake doesn't appear
to ever time out (in this case, the server handshake and certificate
exchange has been completed, but the key exchange never starts).
2. If the client sends an RST and the RST is lost, the client won't send
another RST until Squid sends some data to it on the aborted connection.
In this case, Squid is waiting for data from the client, which will
never come, and will not send any new data to the client. Squid will
never know that the client aborted the connection.
3. There is a lot of memory associated with each connection - my tests
suggest around 1MB. In normal operation these kinds of dead connections
can gradually stack up, leading to a slow but significant memory "leak";
when a really badly behaved client is on the network it can open tens of
thousands of connections per minute and the memory consumption brings
down the server.
4. We can expect similar problems with devices on flakey network
connections, even when the clients are well behaved.
Connections should have a reasonably short timeout during the TLS
handshake - if a client hasn't completed the handshake and made an HTTP
request over the encrypted connection within a few seconds, something is
broken and Squid should tear down the connection. These connections
certainly shouldn't be able to persist forever with neither side sending
I wrote a Python script that makes 1000 concurrent connections as
quickly as it can and send a TLS client handshake over them. Once all
of the connections are open, it then waits for responses from Squid
(which would contain the server handshake and certificate) and quits,
tearing down all of the the connections with an RST.
It seems that the RST packets for around 300 of those connections were
dropped - this sounds surprising, but since all 1000 connections were
aborted simultaneously, there would be a flood of RST packets and its
probably reasonable to expect a significant number to be dropped. The
end result was that netstat showed Squid still had about 300 established
connections, which would never go away.
Re: High memory usage associated with ssl_bump and broken clients
On 09/09/17 04:37, Steve Hill wrote:
> I've identified a problem with Squid 3.5.26 using a lot of memory when
> some broken clients are on the network. Strictly speaking this isn't
> really Squid's fault, but it is a denial of service mechanism so I
> wonder if Squid can help mitigate it.
AFAIK every connection opened or accepted by Squid does have a timeout,
though some of them are long. The mitigation is probably to reduce
request_timeout (v2+) or better the request_start_timeout (v4+).
Please bring up your research on squid-dev mailing list so the guys
working on TLS/SSL and QA can all see it.
You may also need to update the networks congestion control algorithms
to ones that better handle RST packets.