Squid "suspending ICAP service for too many failures"

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Squid "suspending ICAP service for too many failures"

Andrea Venturoli
Hello.

On a box I manage, Squids occasionally stops for a few minutes, blaming
a communication error with C-ICAP (running SquidClamAV).

In cache.log I see:
> 2021/01/04 14:24:24 kid1| suspending ICAP service for too many failures
> 2021/01/04 14:24:24 kid1| essential ICAP service is suspended: icap://127.0.0.1:1344/squidclamav [down,susp,fail11]

This happens usually once a day, always at the same time.
AFAIK there's no particular job running on the server at that time; I
analyzed squid.log to see whether some client accesses something
specific at that hour of the day, but came up empty.

Obviously I looked into C-ICAP logs, but, again, found no hint of any
error or trouble.


Any suggestion on what to do to investigate this?

  bye & Thanks
        av.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Alex Rousskov
On 1/27/21 11:01 AM, Andrea Venturoli wrote:

>> 2021/01/04 14:24:24 kid1| suspending ICAP service for too many failures
>> 2021/01/04 14:24:24 kid1| essential ICAP service is suspended:
>> icap://127.0.0.1:1344/squidclamav [down,susp,fail11]

> This happens usually once a day, always at the same time.
> AFAIK there's no particular job running on the server at that time; I
> analyzed squid.log to see whether some client accesses something
> specific at that hour of the day, but came up empty.

Unfortunately, Squid ICAP client does not log some of the failures at
debugging level 0 or 1.


> Any suggestion on what to do to investigate this?

Enable ICAP debugging and study cache.log for relevant messages,
especially just before the "suspending ICAP service" message shown above.

    debug_options ALL,1 93,7

Debugging will produce a lot of irrelevant to you cache.log lines If
necessary, you can enable debugging an hour (or even a minute!) before
the regular failure. This will allow you to detail the last failure (at
least). It is possible that all the 11 failures are the same.


HTH,

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Andrea Venturoli
On 1/27/21 6:11 PM, Alex Rousskov wrote:

> Enable ICAP debugging and study cache.log for relevant messages,
> especially just before the "suspending ICAP service" message shown above.
>
>      debug_options ALL,1 93,7

Thanks a lot.

As expected, I see Squid connections to C-ICAP starting to time out:
when the number of errors reach 10, Squid marks squidclamav service as
"suspended".

No big surprise. Still I don't get any more insight (Is C-ICAP choking?
Why? What data triggers this?).



Is it a really bad idea to raise icap_connect_timeout?
Same for disabling icap_service_failure_limit?

Other hints?

  bye & Thanks
        av.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Alex Rousskov
On 1/29/21 11:55 AM, Andrea Venturoli wrote:

> I see Squid connections to C-ICAP starting to time out:
> when the number of errors reach 10, Squid marks squidclamav service as
> "suspended".

> No big surprise.

IIRC, you did not disclose timeout suspicions before. This explanation
is news to me, and it eliminates several suspects.


> Still I don't get any more insight (Is C-ICAP choking?
> Why? What data triggers this?).

If you are talking about Squid timing out when attempting to establish a
TCP connection with the ICAP server, then this may by as much insight as
you can get from the Squid side. There is no ICAP "data" at that
connection establishment stage. It is a fairly low-level operation that
Squid and c-icap have little control over. The problem is probably
outside Squid.

I do not know much about c-icap, but I would check whether its
configuration or something like crontab results in hourly restarts and
associated loss of connectivity. The network interface or the routing
tables might also be reset hourly for some reason. The ICAP
server/service might be running out of descriptors or memory.

One potentially useful test is to try to connect to the ICAP server
_while the problem is happening_ using telnet or netcat. When Squid
cannot establish a connection, can you? If the ICAP service is not
running on the Squid box, then try this test both from the Squid box and
from the ICAP box.

Packet captures can tell you whether other Squid-ICAP server connections
were active at the time, whether from-Squid SYN packets were able to
reach the ICAP server, etc.

In other words, basic network troubleshooting steps...


> Is it a really bad idea to raise icap_connect_timeout?

Higher timeout will delay HTTP client transactions for longer periods of
time, of course. If you want to go down the road of finding workarounds,
then check whether raising that timeout actually helps. It is not yet
clear (to me) whether the connections just need more time to be
established or are simply doomed.


> Same for disabling icap_service_failure_limit?

This is an essential ICAP service (icap_service bypass=off). I assume
there is no backup service -- no adaptation_service_set in play here. If
so, disabling the limit means that fewer HTTP transactions will be
inconvenienced in the long run than if the service were to be suspended.
 Hence, fewer ICAP errors will be delivered to Squid clients.

You can also enable bypass.

Fixing the problem would be a much better solution, of course.


HTH,

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Andrea Venturoli
On 1/29/21 8:38 PM, Alex Rousskov wrote:

> IIRC, you did not disclose timeout suspicions before. This explanation
> is news to me, and it eliminates several suspects.

Sorry, I didn't say much in fact.
I gave for granted that it was C-ICAP who stopped answering; I didn't
suspect a Squid bug and had no other idea.



> If you are talking about Squid timing out when attempting to establish a
> TCP connection with the ICAP server, then this may by as much insight as
> you can get from the Squid side.

What I hoped to find in Squid logs was *what* was being passed to C-ICAP
when it locked.
I'll try on the C-ICAP side then.



> I do not know much about c-icap, but I would check whether its
> configuration or something like crontab results in hourly restarts and
> associated loss of connectivity.

AFAIK no.



> The network interface or the routing tables might also be reset hourly

They live on the same host.



> The ICAP server/service might be running out of descriptors or memory.

I'd expect it to log that, but I'll investigate better.



> One potentially useful test is to try to connect to the ICAP server
> _while the problem is happening_ using telnet or netcat. When Squid
> cannot establish a connection, can you?

I'll try, but it's going to be hard, since this happens for a few
minutes once a day at most.



> Packet captures can tell you whether other Squid-ICAP server connections
> were active at the time, whether from-Squid SYN packets were able to
> reach the ICAP server, etc.
>
> In other words, basic network troubleshooting steps...

As I said, they live on the same host, so it can't be a network problem.



> Higher timeout will delay HTTP client transactions for longer periods of
> time, of course. If you want to go down the road of finding workarounds,
> then check whether raising that timeout actually helps. It is not yet
> clear (to me) whether the connections just need more time to be
> established or are simply doomed.

It's not clear to me either, but I suspect so, given the trouble only
last a few minutes.




>> Same for disabling icap_service_failure_limit?
>
> This is an essential ICAP service (icap_service bypass=off). I assume
> there is no backup service -- no adaptation_service_set in play here. If
> so, disabling the limit means that fewer HTTP transactions will be
> inconvenienced in the long run than if the service were to be suspended.
>   Hence, fewer ICAP errors will be delivered to Squid clients.

Agreed.



> You can also enable bypass.

I guess this would open a potential for an attack.
DoS the service (antivirus), then let something nasty pass...



> Fixing the problem would be a much better solution, of course.

Sure, I know these are workarounds and I'd rather avoid them, but I'll
need to consider them as a last resort.



  bye & Thanks
        av.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Amos Jeffries
Administrator
On 31/01/21 6:08 am, Andrea Venturoli wrote:

> On 1/29/21 8:38 PM, Alex Rousskov wrote:
>
>
>> Packet captures can tell you whether other Squid-ICAP server connections
>> were active at the time, whether from-Squid SYN packets were able to
>> reach the ICAP server, etc.
>>
>> In other words, basic network troubleshooting steps...
>
> As I said, they live on the same host, so it can't be a network problem.
>

FYI, that conclusion does not follow. Even on the same host there is a
full TCP/IP networking stack between Squid and ICAP server doing things
to the packets. All localhost removes is the potential problems due to
differences in machine networking stacks.

Network config, firewall rules, packet handling, and/or protocol
negotiation activities between the software are all still happening that
may affect the outcome.



Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Andrea Venturoli
On 1/31/21 1:11 AM, Amos Jeffries wrote:

>> As I said, they live on the same host, so it can't be a network problem.
>>
>
> FYI, that conclusion does not follow. Even on the same host there is a
> full TCP/IP networking stack between Squid and ICAP server doing things
> to the packets. All localhost removes is the potential problems due to
> differences in machine networking stacks.
>
> Network config, firewall rules, packet handling, and/or protocol
> negotiation activities between the software are all still happening that
> may affect the outcome.

Right.
It could be a network problem.
However, I think that's unlikely (also given the host is monitored and I
don't see alerts or other signs of such troubles).
While I cannot exclude that completely, I think I should first
investigate in other directions.

  bye & Thanks
        av.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid "suspending ICAP service for too many failures"

Andrea Venturoli
On 2/1/21 8:56 AM, Andrea Venturoli wrote:

> It could be a network problem.
> However, I think that's unlikely (also given the host is monitored and I
> don't see alerts or other signs of such troubles).
> While I cannot exclude that completely, I think I should first
> investigate in other directions.

Finally I have some insight: this happens when ClamAV receives a new
virus definitions database and so reloads.

Notice I'm using 0.103, which "reloads the signature database without
blocking scanning" (and no I didn't disable this).
So probably, while it works in theory, this slows the system and hence
the timeouts.

I'm now trying with increased timeouts or with disabling ICAP failure
limits.

Thanks to all who helped.

  bye
        av.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users