Squid and CPU 100%

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Squid and CPU 100%

Vieri
Sorry to bring this back up, but every now and then (few days) I need to restart squid because its CPU usage goes up to 100% for a long time (irresponsive).

Right before restarting Squid:

# free [11/07/17 11:18:52]
total used free shared buff/cache available
Mem: 32865056 14811320 1374212 157028 16679524 17557656
Swap: 37036988 287320 36749668

# squidclient mgr:info [11/07/17 11:18:52]

# squidclient mgr:mem [11/07/17 11:19:02]

# squidclient mgr:storedir [11/07/17 11:19:22]

# squidclient mgr:filedescriptors [11/07/17 11:19:42]

# squidclient mgr:events [11/07/17 11:20:12]

In other words, squidclient does not get any info from squid (waited max 30 seconds - maybe I need to wait more).

I also dumped an strace here:
https://drive.google.com/file/d/1yed-sNWrc6KBQSOV7KPbrYOForwwtkCe/view?usp=sharing

You probably won't be able to draw any conclusions so I'll try to update to squid-3.5.27-20171101-re69e56c (I'm now running 3.5.27-20170916-re69e56c).

Note that stopping Squid with:

/usr/sbin/squid -k shutdown -f /etc/squid/squid.conf -n squid

takes a LONG time, but it eventually does stop (the process dies out).

I can then restart it cleanly.

I don't need to kill the pid (unless I'm told to hurry up).

After checking the log I can see these messages:

2017/11/07 11:16:59 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:16:59 kid1| comm_open: socket failure: (24) Too many open files

Now, you can notice that the max_filedescriptors is quite high, and was set correctly before and after the issue I'm reporting:

# tail -n 100000 /var/log/squid/cache.log | grep -i descript
2017/11/06 06:36:14 kid1| With 32768 file descriptors available
2017/11/07 11:15:15 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:15:31 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:16:13 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:16:59 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:17:35 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:17:51 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:18:07 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:18:25 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:18:45 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:19:03 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:19:19 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:19:35 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:19:53 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:20:09 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:20:28 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:20:44 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:21:01 kid1| WARNING! Your cache is running out of filedescriptors
2017/11/07 11:23:05 kid1| With 32768 file descriptors available

So I'm worried that 32768 may not be enough.
Is this weird, or should I really increase this value?

Thanks,

Vieri
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid and CPU 100%

Alex Rousskov
On 11/07/2017 04:42 AM, Vieri wrote:
> So I'm worried that 32768 may not be enough.
> Is this weird, or should I really increase this value?

Think about the underlying physics of what you are observing. It may
help reduce guessing and guide you towards a solution:

You can estimate the reasonable number of file descriptors using
expected maximum request rate and mean response time. Add, say 30%, to
account for long-lived persistent connections. Remember that Squid uses
one descriptor for the from-client connection and one descriptor for the
to-server connection. If that estimate is way below 32K, then the
current limit itself is not the problem. Otherwise, it probably is (and
then you probably need more Squid workers not more descriptors per worker).

* It is possible, perhaps even likely, that some some unknown problem
suddenly consumes almost all CPU cycles, drastically slowing Squid down,
and quickly consuming all file descriptors (because accepted connections
are not served fast enough).

* It is also possible, albeit less likely, that some unknown problem
slows Squid down over time and slowly leads to excessive file descriptor
use and even 100% CPU usage.

To distinguish the two cases, consider studying transaction response
times and the total number of connections logged every minute of every
hour. You should collect such stats for many other reasons anyway!

Alex.
P.S. I trust you have already checked all system logs for more clues and
found nothing of interest there.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid and CPU 100%

Vieri
A quick grep at access.log before the issue I reported shows that there were 1350 lines during a full minute. So I understand that would mean there were 1350 requests during that minute even though some of them were denied by squid.conf's policies. So I should estimate less than 2 * 1350. I would use that value anyway, and add 30% to that. I'd end up with 3510.

I grepped access.log for requests during other time ranges around the time of the reported issue, and I've come up with more or less the same values.

So, correct me if I'm wrong, but raising max_filedescriptors (currently at 32768) won't solve the root cause of the problem.

You mention that maybe another unknown process may suddenly consume almost all CPU cycles, drastically slowing Squid down, and quickly consuming all file descriptors.
That could be the case since I'm using c-icap + clamav. The clamd process reaches 100% peaks at times (for a very short while), but I would have to prove that this can be the cause. Wouldn't I need to monitor CPU usage by all processes at all times?

Also, wouldn't it be useful to check "squidclient mgr:filedescriptors" every 10 minutes or so? I have the feeling it's steadily growing over time, even when overall CPU usage is low. So the second less likely theory may also be a candidate.

Under which circumstances would "squidclient mgr:filedescriptors" show ever growing numbers even on very low CPU and network usage (squid seems to be very responsive)?

Vieri
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid and CPU 100%

Alex Rousskov
On 11/07/2017 10:16 AM, Vieri wrote:

> A quick grep at access.log before the issue I reported shows that
> there were 1350 lines during a full minute. So I understand that
> would mean there were 1350 requests during that minute even though
> some of them were denied by squid.conf's policies. So I should
> estimate less than 2 * 1350. I would use that value anyway, and add
> 30% to that. I'd end up with 3510.

You are talking about request rate. To get the number of connections,
you need to account for response time:

    number of concurrent connections = request rate * response time

And then add 30% overhead for idle persistent connections. That 30%
number is just a rough guess of mine. You can possibly do (much) better
if you study your actual Squid stats.


> So, correct me if I'm wrong, but raising max_filedescriptors
> (currently at 32768) won't solve the root cause of the problem.

Even though you have not computed the right kind of value, I do not
expect you to reach a different conclusion when you fix your formula
(unless your response times are huge). With 1350 requests/minute, your
response times would have to be in the order of minutes to get to 32K
open connections!

Please note that mgr:info and other cheap cache manager pages report
request rates (or equivalent). You do not need to count access log lines
to access and monitor that data.


> You mention that maybe another unknown process may suddenly consume
> almost all CPU cycles,

I did not. I was talking about Squid problems, not another process.


> Wouldn't I need to monitor CPU usage by all processes at all times?

This is not my area of expertise, but, ideally, one should monitor
server usage (using appropriate system administration/monitoring tools).


> Also, wouldn't it be useful to check "squidclient
> mgr:filedescriptors" every 10 minutes or so? I have the feeling it's
> steadily growing over time, even when overall CPU usage is low. So
> the second less likely theory may also be a candidate.

To monitor the number of file descriptors, use mgr:info. It is a much
cheaper action than mgr:filedescriptors, which has to list every
descriptor in every worker. Using mgr:filedescriptors is a good idea in
many cases, but using it every minute may have a noticeable negative
performance impact on concurrent user transactions.


> Under which circumstances would "squidclient mgr:filedescriptors"
> show ever growing numbers even on very low CPU and network usage
> (squid seems to be very responsive)?

A lot of factors, including idle connection pool warmup, bots, and bugs,
may lead to a steady increase in the number of idle and/or stuck
connections. Each idle or stuck connection does not consume much CPU or
network resources...


Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users