Re: parent peer timeout (Amos Jeffries)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: parent peer timeout (Amos Jeffries)

Ignacio Freyre
Hi Amos, thanks for taking the time to analize this.

>Are you actually terminating the peer, or just simulating it some other way?
My method of testing is shutting down the service on the parent "192.168.1.1" with "/etc/init.d/squid stop", whith this in place there are no remaining active connections, and no new ones are being established, all I see is tcp RST responses.
It seems there is a TCP timer that is not configurable, because of the time it takes to notice the dead peer:
> 2017/11/20 22:55:02| Ready to serve requests.
> 2017/11/20 22:55:03| storeLateRelease: released 0 objects
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| Detected DEAD Parent: 192.168.1.1
My objective is to configure dead peer detection based only in TCP connection, can this be achieved?

Do I need to allow a specific type of traffic with "cache_peer_access" statements so dead peer detection happens?, if I comment those lines, dead peer detection works, but I need to enable it so i can filter what traffic those parent peers accept.


regards,
ignacio



On 21/11/17 14:09, Ignacio Freyre wrote:
> Hi guys, i have a simple configuration that i'm testing with 2 parent proxys for a specific domain, if parent proxy 192.168.1.1 fails, failover to 192.168.1.2 proxy.
> I have a couple of questions:
> 1)Having configured "connect-timeout=3" and "connect-fail-limit=2", failover takes about 2 minutes, how can I reduce failover time?

Are you actually terminating the peer, or just simulating it some other way?

The behaviour you are seeing is what will happen for the particular
error you cause to happen. I suspect you are only simulating a firewall
rule table overload (ie firewall suddenly stops allowing *new*
connections) instead of actual peer machine disconnect or shutdown.

The connect-timeout=3 is to make *new* TCP connections signal failure if
the SYN+ACK takes more than 3 seconds to return. Otherwise it is a
successful connect.

Added to that Squid is HTTP/1.1 software these days. Which means it uses
multiplexing and pipeline features to reduce new TCP connections being
needed at all. So that type of network failure may have zero effect on
the proxy<->peer communications. Exactly as intended by the HTTP/1.1 design.


> 2)If I enable cache_peer_access statements, failover never happens because the peers dont get detected as dead


You disabled the features used as primary methods of detecting dead
peers (no-query no-digest).

Additionally restricting traffic with cache_peer_access removes
additional hints from HTTP and TCP traffic.


It is hard to say how those two things are impacting your proxies peer
selection logic, since it is also complicated by the things mentioned
above about #1.


>
> #CONFIGURATION START
> #hostname
> visible_hostname testing
>
> #parent proxy's
> cache_peer 192.168.1.1 parent 3128 0 no-query no-digest connect-timeout=3 connect-fail-limit=2
> cache_peer 192.168.1.2 parent 3128 0 no-query no-digest connect-timeout=3 connect-fail-limit=2
>
> #send traffic to peers
> acl foo_url url_regex site\.domain\.com
> never_direct allow foo_url

regex is the second slowest ACL type around, generally to match domain
use dstdomain ACL type.


>
> #peer access
> cache_peer_access 192.168.1.1 deny !foo_url
> cache_peer_access 192.168.1.2 deny !foo_url
>
> #allow all for testing purposes
> http_access allow all
>

Not a good idea even for testing purposes. If there is a problem with
your intended http_access rules that needs solving before anything else
can be properly investigated since what is allowed to be handled by the
proxy impacts on what can happen for outbound attempts.


> # Squid normally listens to port 3128
> http_port 3128
>
> # Leave coredumps in the first cache dir
> coredump_dir /var/spool/squid
>
> # Add any of your own refresh_pattern entries above these.
> refresh_pattern ^ftp:           1440    20%     10080
> refresh_pattern ^gopher:        1440    0%      1440
> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
> refresh_pattern .               0       20%     4320
> #CONFIGURATION END
>
> LOGS that I see when peer is detected as dead
> 2017/11/20 22:55:02| Ready to serve requests.
> 2017/11/20 22:55:03| storeLateRelease: released 0 objects
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| Detected DEAD Parent: 192.168.1.1
>

Configure "debug_options 28,3" to see the peer selection results.


Amos

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: parent peer timeout (Amos Jeffries)

Amos Jeffries
Administrator
On 22/11/17 05:00, Ignacio Freyre wrote:
> Hi Amos, thanks for taking the time to analize this.
>
>> Are you actually terminating the peer, or just simulating it some other way?
> My method of testing is shutting down the service on the parent "192.168.1.1" with "/etc/init.d/squid stop", whith this in place there are no remaining active connections, and no new ones are being established, all I see is tcp RST responses.

Ah, add to your tests a check to see when that process actually stops.
It is quite likely that a long portion of those 2 minutes is the peer
doing its slow graceful shutdown procedure - during which time it will
stay LIVE and not DEAD.

You may also want to monitor the TCP state of the connections from Squid
to the peer. Termination by the endpoint may not immediately trigger
full connection closure all the way into Squid. So there is a bit of
delay there as well until Squid picks up on the change.

The best way to shutdown Squid is with the "squid -k shutdown" command.
Use it twice in a row for quick shutdown. First use initiates shutdown,
second one skips the process to the end of the graceful delay.


> It seems there is a TCP timer that is not configurable, because of the time it takes to notice the dead peer:
>> 2017/11/20 22:55:02| Ready to serve requests.
>> 2017/11/20 22:55:03| storeLateRelease: released 0 objects
>> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
>> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
>> 2017/11/20 22:56:55| Detected DEAD Parent: 192.168.1.1
> My objective is to configure dead peer detection based only in TCP connection, can this be achieved?

Yes, by the means you already configured.

Also ICMP is not optional. Ensure you have it working in your network.
TCP connect errors are sent using ICMP from the network router(s) to
Squid in just nanoseconds instead of whole seconds of waiting times.
That should make the connect-timeout= setting mostly irrelevant.


>
> Do I need to allow a specific type of traffic with "cache_peer_access" statements so dead peer detection happens?, if I comment those lines, dead peer detection works, but I need to enable it so i can filter what traffic those parent peers accept.
>

What you configured should have been fine.

The issue is just that by relying only on the TCP/HTTP traffic for
detection, reducing traffic sent to the peer also reduces its chances to
detect failures. YMMV as to whether that is a good thing or not.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users