forward_max_tries 1 has no effect

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

forward_max_tries 1 has no effect

Ivan Larionov
Hello.

We have an issue with squid when it tries to re-forward / retry failed request even when forward_max_tries is set to 1. The situation when it happens is when there's no response, parent just closes the connection.

Relevant parts of configuration so you understand the architecture:

cache_peer 127.0.0.1 parent 18070 0 no-query no-digest no-netdb-exchange name=proxy
never_direct allow all
negative_ttl 0 seconds
forward_max_tries 1
retry_on_error off

The traffic flow from tcpdump is like this:

squid to parent
parent to squid
ACK

waiting (no response for ~40 seconds)

parent to squid
FIN, ACK
squid to parent
FIN, ACK
parent to squid
FIN

Immediately after that:

squid to parent
GET http://HOST/ (again)

Debug logs from ALL,2:

http.cc(2229) sendRequest: HTTP Server local=127.0.0.2:46867 remote=127.0.0.1:18070 FD 26 flags=1
http.cc(2230) sendRequest: HTTP Server REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
http.cc(1299) continueAfterParsingHeader: WARNING: HTTP: Invalid Response: No object data received for http://HOST:12345/ AKA HOST/
FwdState.cc(655) handleUnregisteredServerEnd: self=0x430a438*2 err=0x445fcf8 http://HOST:12345/
http.cc(2229) sendRequest: HTTP Server local=127.0.0.2:34417 remote=127.0.0.1:18070 FD 26 flags=1
http.cc(2230) sendRequest: HTTP Server REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1

It doesn't happen 100% times. Sometimes squid returns 502 after the 1st try, sometimes it retries once. Also I haven't seen more than 1 retry.

Could it be a bug? We'd really like to disable these retries.

Squid Cache: Version 3.5.27
Service Name: squid
configure options:  '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/sbin' '--sbindir=/usr/sbin' '--sysconfdir=/etc/squid' '--libdir=/usr/lib' '--libexecdir=/usr/lib/squid' '--includedir=/usr/include' '--datadir=/usr/share' '--sharedstatedir=/usr/com' '--localstatedir=/var' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-epoll' '--enable-removal-policies=heap,lru' '--enable-storeio=aufs,rock' '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests' '--with-large-files' '--with-maxfd=16384' '--enable-htcp'

--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Amos Jeffries
Administrator
On 23/11/17 14:20, Ivan Larionov wrote:
> Hello.
>
> We have an issue with squid when it tries to re-forward / retry failed
> request even when forward_max_tries is set to 1. The situation when it
> happens is when there's no response, parent just closes the connection.
>
...
>
> It doesn't happen 100% times. Sometimes squid returns 502 after the 1st
> try, sometimes it retries once. Also I haven't seen more than 1 retry.
>

Please enable debug_options 44,2 to see what destinations your Squid is
actually finding.

max_forward_tries is just a rough cap on the number of server names
which can be found when generating that list. The actual destinations
count can exceed it if one or more of the servers happens to have
multiple IPs to try.

The overall transaction can involve retries if one of the other layers
(TCP or HTTP) contains retry semantics to a single server.



> Could it be a bug? We'd really like to disable these retries.
>

Why are trying to break HTTP?
What is the actual problem you are trying to resolve here?


Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Ivan Larionov
>
> On Nov 23, 2017, at 12:32 AM, Amos Jeffries <[hidden email]> wrote:
>
> On 23/11/17 14:20, Ivan Larionov wrote:
>> Hello.
>> We have an issue with squid when it tries to re-forward / retry failed request even when forward_max_tries is set to 1. The situation when it happens is when there's no response, parent just closes the connection.
> ...
>> It doesn't happen 100% times. Sometimes squid returns 502 after the 1st try, sometimes it retries once. Also I haven't seen more than 1 retry.
>
> Please enable debug_options 44,2 to see what destinations your Squid is actually finding.

I'll check this on Monday.

>
> max_forward_tries is just a rough cap on the number of server names which can be found when generating that list. The actual destinations count can exceed it if one or more of the servers happens to have multiple IPs to try.
>
> The overall transaction can involve retries if one of the other layers (TCP or HTTP) contains retry semantics to a single server.
>
>
>
>> Could it be a bug? We'd really like to disable these retries.
>
> Why are trying to break HTTP?
> What is the actual problem you are trying to resolve here?
>

Why do you think I'm trying to break HTTP?

squid forwards the request to parent but parent misbehaves and just closes the connection after 40 seconds. I'm trying to prevent retry of request in such situation. Why squid retries if I never asked him to do it and specifically said "forward_max_tries 1".

And this is not a connection failure, squid successfully establishes the connection and sends the request, parent ACKs it, just never responses back and proactively closes the connection.

We're already fixing parent behavior, but still want to disable retries on squid side.

>
> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Amos Jeffries
Administrator

On 24/11/17 10:03, Ivan Larionov wrote:

>>
>> On Nov 23, 2017, at 12:32 AM, Amos Jeffries <[hidden email]> wrote:
>>
>> On 23/11/17 14:20, Ivan Larionov wrote:
>>> Hello.
>>> We have an issue with squid when it tries to re-forward / retry failed request even when forward_max_tries is set to 1. The situation when it happens is when there's no response, parent just closes the connection.
>> ...
>>> It doesn't happen 100% times. Sometimes squid returns 502 after the 1st try, sometimes it retries once. Also I haven't seen more than 1 retry.
>>
>> Please enable debug_options 44,2 to see what destinations your Squid is actually finding.
>
> I'll check this on Monday.
>
>>
>> max_forward_tries is just a rough cap on the number of server names which can be found when generating that list. The actual destinations count can exceed it if one or more of the servers happens to have multiple IPs to try.
>>
>> The overall transaction can involve retries if one of the other layers (TCP or HTTP) contains retry semantics to a single server.
>>
>>
>>
>>> Could it be a bug? We'd really like to disable these retries.
>>
>> Why are trying to break HTTP?
>> What is the actual problem you are trying to resolve here?
>>
>
> Why do you think I'm trying to break HTTP?
>
> squid forwards the request to parent but parent misbehaves and just closes the connection after 40 seconds. I'm trying to prevent retry of request in such situation. Why squid retries if I never asked him to do it and specifically said "forward_max_tries 1".
>
> And this is not a connection failure, squid successfully establishes the connection and sends the request, parent ACKs it, just never responses back and proactively closes the connection.
>

This is not misbehaviour on the part of either Squid nor the parent.
<https://tools.ietf.org/html/rfc7230#section-6.3.1>
"Connections can be closed at any time, with or without intention."


As has been discussed in other threads recently there are servers out
there starting to greylist TCP connections, closing the first one some
time *after* SYN+ACK regardless of what the proxy sends and accepting
any followup connection attempts.

NP: That can result in exactly the behaviour you describe from the peer
as Squid does not wait for a FIN to arrive before sending its upstream
HTTP request - Squid will "randomly" get a FIN or a RST depending on
whether the FIN or the DATA packet wins the race into the Squid machines
TCP stack. FIN and RST have different retry properties which might
explain your "sometimes retries" behaviour.


Also, TCP connections fail quite often for many other reasons anyway.
Anything from power fluctuations at a router to BGP switching the packet
route dropping packets. They are most often a short-term situation which
is resolved by the time the repeat is attempted.

What you are trying to do will result in Squid being unable to cope with
any of these transitory restrictions from the TCP environment and force
the client to receive a terminal error page.
  That will greatly slow down detection and recovery from the slightly
longer-lived TCP issues in Squid itself and may result in N other
clients also unnecessarily receiving the same error response as bad
connection attempts gets spread between many clients (all getting
errors) instead of isolated to the one/few who hit it when the issue
initially occurs.

Expanding the retries to large numbers (ie the recent default change to
25), or to low numbers (eg the old default of 5) are reasonable things
to do depending on the network stability to your upstreams. But going
all the way to 0 retries is guaranteed to lead to more client visible
problems than necessary.


All that asside I phrased it as a question because you might have had a
good reason for increasing the visible failure rates.


> We're already fixing parent behavior, but still want to disable retries on squid side.
>


Since you describe this as peer misbehaviour, then treating it to Squids
normal TCP failure recovery is the best behaviour. Retry is the intended
correct behaviour for a proxy to perform on any non-idempotent requests.
In your case up to a value of 1 retry before declaring non-temporary
route failure.

NP: idempotent vs non-idempotent may be another reason behind the
observed behaviour of retry happening only sometimes.


If you are doing this due to overall latency/delay on the affected
client traffic you would be better off reducing the timeouts involved
(cache_peer connect-timeout= parameter AFAICS) than aiming at a retry
count of 0. Perhapse also requiring a few standby=N persistent
connections to be maintained if the peer is HTTP/1.1 capable.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Ivan Larionov
I think I found part of the config which triggers the retry.

======

# Check for newproxy request header
acl newproxy_acl req_header x-use-newproxy -i true

# proxy
cache_peer 127.0.0.1 parent 18070 0 no-query no-digest no-netdb-exchange name=proxy
cache_peer_access proxy deny newproxy_acl

# newproxy
cache_peer 127.0.0.1 parent 18079 0 no-query no-digest no-netdb-exchange name=newproxy
cache_peer_access newproxy allow newproxy_acl
cache_peer_access newproxy deny all

never_direct allow all

======

I see retries only when squid config has 2 parents. If I comment out everything related to "newproxy" I can't reproduce this behavior anymore.

BUT

My test request could only be forwarded to "proxy" since it doesn't have "x-use-newproxy" header.

Which results in the following:

squid has 2 parents
request could only be forwarded to the 1st parent due to ACL
parent doesn't respond and closes the TCP connection
squid retries with the same parent ignoring "forward_max_tries 1"

I also want to clarify some facts which make me think that it could be a bug.

1. There're no issues with TCP connection. Squid successfully connects to parent and sends an HTTP request.
2. Parent ACKs HTTP request and then correctly closes the connection with FIN,ACK after 40 seconds. There're no TCP timeouts/reconnects involved. The only issue here is that parent doesn't send any HTTP response.
3. forward_max_tries is set to 1 to make sure squid won't retry. Parent handles the retry so we don't want squid to make any additional retries.


> Squid does not try to re-forward a request if at least one of the following conditions is true:
> …
> The number of forwarding attempts exceeded forward_max_tries. For example, if you set forward_max_tries to 1 (one), then no requests will be re-forwarded.
> …
> Squid has no alternative destinations to try. Please note that alternative destinations may include multiple next hop IP addresses and multiple peers.
> …

Another part of config which may or may not be related (this is to increase the amount of local ports to use):

# 33% of traffic per local IP
acl third random 1/3
acl half random 1/2
tcp_outgoing_address 127.0.0.2 third
tcp_outgoing_address 127.0.0.3 half
tcp_outgoing_address 127.0.0.4

Logs:

ALL,2 (includes 44,2):

2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15
2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection on local=0.0.0.0:3128 remote=[::] FD 15 flags=9
2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest: HTTP Client local=127.0.0.1:3128 remote=127.0.0.1:53798 FD 45 flags=1
2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest: HTTP Client REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
Host: HOST:12345
User-Agent: curl/7.51.0
Accept: */*
Proxy-Connection: Keep-Alive


----------
2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745) clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED; last ACL checked: localhost
2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721) clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745) clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED; last ACL checked: localhost
2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding client request local=127.0.0.1:3128 remote=127.0.0.1:53798 FD 45 flags=1, url=http://HOST:12345/
2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: http://HOST:12345/' via 127.0.0.1
2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths: Found sources for 'http://HOST:12345/'
2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths:   always_direct = DENIED
2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths:    never_direct = ALLOWED
2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths:      cache_peer = local=127.0.0.3 remote=127.0.0.1:18070 flags=1
2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths:        timedout = 0
2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server local=127.0.0.3:57091 remote=127.0.0.1:18070 FD 40 flags=1
2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
User-Agent: curl/7.51.0
Accept: */*
Host: HOST:12345
Cache-Control: max-age=259200
Connection: keep-alive


----------

[SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]

2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader: WARNING: HTTP: Invalid Response: No object data received for http://HOST:12345/ AKA HOST/
2017/11/27 15:54:20.627| 17,2| FwdState.cc(655) handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338 http://HOST:12345/
2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server local=127.0.0.3:41355 remote=127.0.0.1:18070 FD 40 flags=1
2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
User-Agent: curl/7.51.0
Accept: */*
Host: HOST:12345
Cache-Control: max-age=259200
Connection: keep-alive


----------

[SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]

2017/11/27 15:55:00.728| ctx: enter level  0: 'http://HOST:12345/'
2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP Server local=127.0.0.3:41355 remote=127.0.0.1:18070 FD 40 flags=1
2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP Server REPLY:
---------
HTTP/1.0 502 Bad Gateway
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

----------
2017/11/27 15:55:00.728| ctx: exit level  0
2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable: StoreEntry::checkCachable: NO: not cachable
2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable: StoreEntry::checkCachable: NO: not cachable
2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073) processReplyAccessResult: The reply for GET http://HOST:12345/ is ALLOWED, because it matched (access_log stdio:/var/log/squid/access.log line)
2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage: HTTP Client local=127.0.0.1:3128 remote=127.0.0.1:53798 FD 45 flags=1
2017/11/27 15:55:00.728| 11,2| client_side.cc(1410) sendStartOfMessage: HTTP Client REPLY:
---------
HTTP/1.1 502 Bad Gateway
Date: Mon, 27 Nov 2017 23:54:20 GMT
Cache-Control: no-cache
Content-Type: text/html
X-Cache: MISS from ip-172-23-18-130
X-Cache-Lookup: MISS from ip-172-23-18-130:3128
Transfer-Encoding: chunked
Connection: keep-alive


----------


On Thu, Nov 23, 2017 at 11:43 PM, Amos Jeffries <[hidden email]> wrote:

On 24/11/17 10:03, Ivan Larionov wrote:

On Nov 23, 2017, at 12:32 AM, Amos Jeffries <[hidden email]> wrote:

On 23/11/17 14:20, Ivan Larionov wrote:
Hello.
We have an issue with squid when it tries to re-forward / retry failed request even when forward_max_tries is set to 1. The situation when it happens is when there's no response, parent just closes the connection.
...
It doesn't happen 100% times. Sometimes squid returns 502 after the 1st try, sometimes it retries once. Also I haven't seen more than 1 retry.

Please enable debug_options 44,2 to see what destinations your Squid is actually finding.

I'll check this on Monday.


max_forward_tries is just a rough cap on the number of server names which can be found when generating that list. The actual destinations count can exceed it if one or more of the servers happens to have multiple IPs to try.

The overall transaction can involve retries if one of the other layers (TCP or HTTP) contains retry semantics to a single server.



Could it be a bug? We'd really like to disable these retries.

Why are trying to break HTTP?
What is the actual problem you are trying to resolve here?


Why do you think I'm trying to break HTTP?

squid forwards the request to parent but parent misbehaves and just closes the connection after 40 seconds. I'm trying to prevent retry of request in such situation. Why squid retries if I never asked him to do it and specifically said "forward_max_tries 1".

And this is not a connection failure, squid successfully establishes the connection and sends the request, parent ACKs it, just never responses back and proactively closes the connection.


This is not misbehaviour on the part of either Squid nor the parent.
<https://tools.ietf.org/html/rfc7230#section-6.3.1>
"Connections can be closed at any time, with or without intention."


As has been discussed in other threads recently there are servers out there starting to greylist TCP connections, closing the first one some time *after* SYN+ACK regardless of what the proxy sends and accepting any followup connection attempts.

NP: That can result in exactly the behaviour you describe from the peer as Squid does not wait for a FIN to arrive before sending its upstream HTTP request - Squid will "randomly" get a FIN or a RST depending on whether the FIN or the DATA packet wins the race into the Squid machines TCP stack. FIN and RST have different retry properties which might explain your "sometimes retries" behaviour.


Also, TCP connections fail quite often for many other reasons anyway. Anything from power fluctuations at a router to BGP switching the packet route dropping packets. They are most often a short-term situation which is resolved by the time the repeat is attempted.

What you are trying to do will result in Squid being unable to cope with any of these transitory restrictions from the TCP environment and force the client to receive a terminal error page.
 That will greatly slow down detection and recovery from the slightly longer-lived TCP issues in Squid itself and may result in N other clients also unnecessarily receiving the same error response as bad connection attempts gets spread between many clients (all getting errors) instead of isolated to the one/few who hit it when the issue initially occurs.

Expanding the retries to large numbers (ie the recent default change to 25), or to low numbers (eg the old default of 5) are reasonable things to do depending on the network stability to your upstreams. But going all the way to 0 retries is guaranteed to lead to more client visible problems than necessary.


All that asside I phrased it as a question because you might have had a good reason for increasing the visible failure rates.


We're already fixing parent behavior, but still want to disable retries on squid side.



Since you describe this as peer misbehaviour, then treating it to Squids normal TCP failure recovery is the best behaviour. Retry is the intended correct behaviour for a proxy to perform on any non-idempotent requests. In your case up to a value of 1 retry before declaring non-temporary route failure.

NP: idempotent vs non-idempotent may be another reason behind the observed behaviour of retry happening only sometimes.


If you are doing this due to overall latency/delay on the affected client traffic you would be better off reducing the timeouts involved (cache_peer connect-timeout= parameter AFAICS) than aiming at a retry count of 0. Perhapse also requiring a few standby=N persistent connections to be maintained if the peer is HTTP/1.1 capable.

Amos



--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Alex Rousskov
On 11/27/2017 05:19 PM, Ivan Larionov wrote:

> I see retries only when squid config has 2 parents. If I comment out
> everything related to "newproxy" I can't reproduce this behavior anymore.

The posted logs are not detailed enough to confirm or deny that IMO, but
I suspect that you are dealing with at least one bug.


> https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F
>
>> Squid does not try to re-forward a request if at least one of the following conditions is true:
>>
>> The number of forwarding attempts exceeded forward_max_tries. For
>> example, if you set forward_max_tries to 1 (one), then no requests
>> will be re-forwarded.


AFAICT, there is an off-by-one bug in Squid that violates the above:

>     if (n_tries > Config.forward_max_tries)
>         return false;

The n_tries counter is incremented before Squid makes a request
forwarding attempt. With n_tries and Config.forward_max_tries both set
to 1, the quoted FwdState::checkRetry() code will not prevent
re-forwarding. There is a similar problem in FwdState::reforward(). This
reasoning needs confirmation/testing.

Please note that simply changing the ">" operator to ">=" may break
other things in a difficult-to-detect-by-simple-tests ways. The correct
fix may be more complex than it looks and may involve making policy
decisions regarding forward_max_tries meaning. The best fix would remove
checkRetry() and reforward() duplication. This code is difficult to work
with; many related code names are misleading.


>> Squid has no alternative destinations to try. Please note that
>> alternative destinations may include multiple next hop IP addresses
>> and multiple peers.

The fact that Squid sends two requests to the same peer with only one
peer address selected suggests that Squid is retrying a failed
persistent connection rather than re-forwarding after receiving a bad
response. Again, the logs are not detailed enough to distinguish the two
cases. I can only see that a single peer/destination address was
selected (not two), which is correct/expected behavior. I cannot see
what happened next with sufficient detail.

Going forward, you have several options, including:

A. Post a link to compressed ALL,7+ logs to confirm bug(s).
B. Fix the broken condition(s) in FwdState. See above.

HTH,

Alex.


> 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15
> 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection
> on local=0.0.0.0:3128 <http://0.0.0.0:3128> remote=[::] FD 15 flags=9
> 2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest:
> HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
> remote=127.0.0.1:53798 <http://127.0.0.1:53798> FD 45 flags=1
> 2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest:
> HTTP Client REQUEST:
> ---------
> GET http://HOST:12345/ HTTP/1.1
> Host: HOST:12345
> User-Agent: curl/7.51.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
>
> ----------
> 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
> clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
> last ACL checked: localhost
> 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721)
> clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
> 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
> clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
> last ACL checked: localhost
> 2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding
> client request local=127.0.0.1:3128 <http://127.0.0.1:3128>
> remote=127.0.0.1:53798 <http://127.0.0.1:53798> FD 45 flags=1,
> url=http://HOST:12345/
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths:
> Find IP destination for: http://HOST:12345/' via 127.0.0.1
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths:
> Found sources for 'http://HOST:12345/'
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths: 
>  always_direct = DENIED
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths: 
>   never_direct = ALLOWED
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths: 
>     cache_peer = local=127.0.0.3 remote=127.0.0.1:18070
> <http://127.0.0.1:18070> flags=1
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths: 
>       timedout = 0
> 2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server
> local=127.0.0.3:57091 <http://127.0.0.3:57091> remote=127.0.0.1:18070
> <http://127.0.0.1:18070> FD 40 flags=1
> 2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server
> REQUEST:
> ---------
> GET http://HOST:12345/ HTTP/1.1
> User-Agent: curl/7.51.0
> Accept: */*
> Host: HOST:12345
> Cache-Control: max-age=259200
> Connection: keep-alive
>
>
> ----------
>
> [SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]
>
> 2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader:
> WARNING: HTTP: Invalid Response: No object data received for
> http://HOST:12345/ AKA HOST/
> 2017/11/27 15:54:20.627| 17,2| FwdState.cc(655)
> handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338
> http://HOST:12345/
> 2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server
> local=127.0.0.3:41355 <http://127.0.0.3:41355> remote=127.0.0.1:18070
> <http://127.0.0.1:18070> FD 40 flags=1
> 2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server
> REQUEST:
> ---------
> GET http://HOST:12345/ HTTP/1.1
> User-Agent: curl/7.51.0
> Accept: */*
> Host: HOST:12345
> Cache-Control: max-age=259200
> Connection: keep-alive
>
>
> ----------
>
> [SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]
>
> 2017/11/27 15:55:00.728| ctx: enter level  0: 'http://HOST:12345/'
> 2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP
> Server local=127.0.0.3:41355 <http://127.0.0.3:41355>
> remote=127.0.0.1:18070 <http://127.0.0.1:18070> FD 40 flags=1
> 2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP
> Server REPLY:
> ---------
> HTTP/1.0 502 Bad Gateway
> Cache-Control: no-cache
> Connection: close
> Content-Type: text/html
>
> <html><body><h1>502 Bad Gateway</h1>
> The server returned an invalid or incomplete response.
> </body></html>
>
> ----------
> 2017/11/27 15:55:00.728| ctx: exit level  0
> 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
> StoreEntry::checkCachable: NO: not cachable
> 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
> StoreEntry::checkCachable: NO: not cachable
> 2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073)
> processReplyAccessResult: The reply for GET http://HOST:12345/ is
> ALLOWED, because it matched (access_log stdio:/var/log/squid/access.log
> line)
> 2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage:
> HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
> remote=127.0.0.1:53798 <http://127.0.0.1:53798> FD 45 flags=1
> 2017/11/27 15:55:00.728| 11,2| client_side.cc(1410) sendStartOfMessage:
> HTTP Client REPLY:
> ---------
> HTTP/1.1 502 Bad Gateway
> Date: Mon, 27 Nov 2017 23:54:20 GMT
> Cache-Control: no-cache
> Content-Type: text/html
> X-Cache: MISS from ip-172-23-18-130
> X-Cache-Lookup: MISS from ip-172-23-18-130:3128
> Transfer-Encoding: chunked
> Connection: keep-alive
>
>
> ----------
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Ivan Larionov
Thanks Alex, this is very helpful.

Another interesting fact is that I can't reproduce this issue if squid has no other traffic except my testing requests. But it's easy to reproduce when server has other traffic.

The problem is that with other traffic I can't provide the whole log file with debug ALL,7 enabled because it has other requests.

So I tried to select only parts related to my test request (this is ALL,7):



On Tue, Nov 28, 2017 at 7:32 AM, Alex Rousskov <[hidden email]> wrote:
On 11/27/2017 05:19 PM, Ivan Larionov wrote:

> I see retries only when squid config has 2 parents. If I comment out
> everything related to "newproxy" I can't reproduce this behavior anymore.

The posted logs are not detailed enough to confirm or deny that IMO, but
I suspect that you are dealing with at least one bug.


> https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F
>
>> Squid does not try to re-forward a request if at least one of the following conditions is true:
>>
>> The number of forwarding attempts exceeded forward_max_tries. For
>> example, if you set forward_max_tries to 1 (one), then no requests
>> will be re-forwarded.


AFAICT, there is an off-by-one bug in Squid that violates the above:

>     if (n_tries > Config.forward_max_tries)
>         return false;

The n_tries counter is incremented before Squid makes a request
forwarding attempt. With n_tries and Config.forward_max_tries both set
to 1, the quoted FwdState::checkRetry() code will not prevent
re-forwarding. There is a similar problem in FwdState::reforward(). This
reasoning needs confirmation/testing.

Please note that simply changing the ">" operator to ">=" may break
other things in a difficult-to-detect-by-simple-tests ways. The correct
fix may be more complex than it looks and may involve making policy
decisions regarding forward_max_tries meaning. The best fix would remove
checkRetry() and reforward() duplication. This code is difficult to work
with; many related code names are misleading.


>> Squid has no alternative destinations to try. Please note that
>> alternative destinations may include multiple next hop IP addresses
>> and multiple peers.

The fact that Squid sends two requests to the same peer with only one
peer address selected suggests that Squid is retrying a failed
persistent connection rather than re-forwarding after receiving a bad
response. Again, the logs are not detailed enough to distinguish the two
cases. I can only see that a single peer/destination address was
selected (not two), which is correct/expected behavior. I cannot see
what happened next with sufficient detail.

Going forward, you have several options, including:

A. Post a link to compressed ALL,7+ logs to confirm bug(s).
B. Fix the broken condition(s) in FwdState. See above.

HTH,

Alex.


> 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15
> 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection
> on local=0.0.0.0:3128 <http://0.0.0.0:3128> remote=[::] FD 15 flags=9
> 2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest:
> HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
> remote=127.0.0.1:53798 <http://127.0.0.1:53798> FD 45 flags=1
> 2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest:
> HTTP Client REQUEST:
> ---------
> GET http://HOST:12345/ HTTP/1.1
> Host: HOST:12345
> User-Agent: curl/7.51.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
>
> ----------
> 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
> clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
> last ACL checked: localhost
> 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721)
> clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
> 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
> clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
> last ACL checked: localhost
> 2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding
> client request local=127.0.0.1:3128 <http://127.0.0.1:3128>
> remote=127.0.0.1:53798 <http://127.0.0.1:53798> FD 45 flags=1,
> url=http://HOST:12345/
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths:
> Find IP destination for: http://HOST:12345/' via 127.0.0.1
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths:
> Found sources for 'http://HOST:12345/'
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths: 
>  always_direct = DENIED
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths: 
>   never_direct = ALLOWED
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths: 
>     cache_peer = local=127.0.0.3 remote=127.0.0.1:18070
> <http://127.0.0.1:18070> flags=1
> 2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths: 
>       timedout = 0
> 2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server
> local=127.0.0.3:57091 <http://127.0.0.3:57091> remote=127.0.0.1:18070
> <http://127.0.0.1:18070> FD 40 flags=1
> 2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server
> REQUEST:
> ---------
> GET http://HOST:12345/ HTTP/1.1
> User-Agent: curl/7.51.0
> Accept: */*
> Host: HOST:12345
> Cache-Control: max-age=259200
> Connection: keep-alive
>
>
> ----------
>
> [SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]
>
> 2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader:
> WARNING: HTTP: Invalid Response: No object data received for
> http://HOST:12345/ AKA HOST/
> 2017/11/27 15:54:20.627| 17,2| FwdState.cc(655)
> handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338
> http://HOST:12345/
> 2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server
> local=127.0.0.3:41355 <http://127.0.0.3:41355> remote=127.0.0.1:18070
> <http://127.0.0.1:18070> FD 40 flags=1
> 2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server
> REQUEST:
> ---------
> GET http://HOST:12345/ HTTP/1.1
> User-Agent: curl/7.51.0
> Accept: */*
> Host: HOST:12345
> Cache-Control: max-age=259200
> Connection: keep-alive
>
>
> ----------
>
> [SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]
>
> 2017/11/27 15:55:00.728| ctx: enter level  0: 'http://HOST:12345/'
> 2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP
> Server local=127.0.0.3:41355 <http://127.0.0.3:41355>
> remote=127.0.0.1:18070 <http://127.0.0.1:18070> FD 40 flags=1
> 2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP
> Server REPLY:
> ---------
> HTTP/1.0 502 Bad Gateway
> Cache-Control: no-cache
> Connection: close
> Content-Type: text/html
>
> <html><body><h1>502 Bad Gateway</h1>
> The server returned an invalid or incomplete response.
> </body></html>
>
> ----------
> 2017/11/27 15:55:00.728| ctx: exit level  0
> 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
> StoreEntry::checkCachable: NO: not cachable
> 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
> StoreEntry::checkCachable: NO: not cachable
> 2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073)
> processReplyAccessResult: The reply for GET http://HOST:12345/ is
> ALLOWED, because it matched (access_log stdio:/var/log/squid/access.log
> line)
> 2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage:
> HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
> remote=127.0.0.1:53798 <http://127.0.0.1:53798> FD 45 flags=1
> 2017/11/27 15:55:00.728| 11,2| client_side.cc(1410) sendStartOfMessage:
> HTTP Client REPLY:
> ---------
> HTTP/1.1 502 Bad Gateway
> Date: Mon, 27 Nov 2017 23:54:20 GMT
> Cache-Control: no-cache
> Content-Type: text/html
> X-Cache: MISS from ip-172-23-18-130
> X-Cache-Lookup: MISS from ip-172-23-18-130:3128
> Transfer-Encoding: chunked
> Connection: keep-alive
>
>
> ----------



--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Alex Rousskov
On 11/28/2017 02:27 PM, Ivan Larionov wrote:

> Another interesting fact is that I can't reproduce this issue if squid
> has no other traffic except my testing requests. But it's easy to
> reproduce when server has other traffic.

I did not check your logs carefully, but I believe that (when things do
not work the way you expect) your Squid is retrying a failed persistent
connection (rather than re-forwarding the request after receiving a bad
response). See the "pconn race happened" line below:

> 1_retry.log:2017/11/28 12:55:12.731| 17,3| FwdState.cc(416) fail: ERR_ZERO_SIZE_OBJECT "Bad Gateway"
> 1_retry.log:2017/11/28 12:55:12.731| 17,5| FwdState.cc(430) fail: pconn race happened
> 1_retry.log:2017/11/28 12:55:12.731| 93,5| AsyncJob.cc(84) mustStop: HttpStateData will stop, reason: HttpStateData::continueAfterParsingHeader
> 1_retry.log:2017/11/28 12:55:12.731| 17,3| FwdState.cc(618) retryOrBail: re-forwarding (1 tries, 40 secs)
> 1_retry.log:2017/11/28 12:55:12.731| 17,4| FwdState.cc(621) retryOrBail: retrying the same destination


Assuming you tested with forward_max_tries set to 1, the retryOrBail
lines above confirm the off-by-one problem I was describing in my
previous response.

AFAICT, compensating by setting forward_max_tries to zero will _not_
work (for reasons unrelated to the problem at hand).


FWIW, your current options include those outlined at
http://wiki.squid-cache.org/SquidFaq/AboutSquid#How_to_add_a_new_Squid_feature.2C_enhance.2C_of_fix_something.3F


Alex.


> On Tue, Nov 28, 2017 at 7:32 AM, Alex Rousskov wrote:
>
>     On 11/27/2017 05:19 PM, Ivan Larionov wrote:
>
>     > I see retries only when squid config has 2 parents. If I comment out
>     > everything related to "newproxy" I can't reproduce this behavior anymore.
>
>     The posted logs are not detailed enough to confirm or deny that IMO, but
>     I suspect that you are dealing with at least one bug.
>
>
>     > https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F
>     <https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F>
>     >
>     >> Squid does not try to re-forward a request if at least one of the following conditions is true:
>     >>
>     >> The number of forwarding attempts exceeded forward_max_tries. For
>     >> example, if you set forward_max_tries to 1 (one), then no requests
>     >> will be re-forwarded.
>
>
>     AFAICT, there is an off-by-one bug in Squid that violates the above:
>
>     >     if (n_tries > Config.forward_max_tries)
>     >         return false;
>
>     The n_tries counter is incremented before Squid makes a request
>     forwarding attempt. With n_tries and Config.forward_max_tries both set
>     to 1, the quoted FwdState::checkRetry() code will not prevent
>     re-forwarding. There is a similar problem in FwdState::reforward(). This
>     reasoning needs confirmation/testing.
>
>     Please note that simply changing the ">" operator to ">=" may break
>     other things in a difficult-to-detect-by-simple-tests ways. The correct
>     fix may be more complex than it looks and may involve making policy
>     decisions regarding forward_max_tries meaning. The best fix would remove
>     checkRetry() and reforward() duplication. This code is difficult to work
>     with; many related code names are misleading.
>
>
>     >> Squid has no alternative destinations to try. Please note that
>     >> alternative destinations may include multiple next hop IP addresses
>     >> and multiple peers.
>
>     The fact that Squid sends two requests to the same peer with only one
>     peer address selected suggests that Squid is retrying a failed
>     persistent connection rather than re-forwarding after receiving a bad
>     response. Again, the logs are not detailed enough to distinguish the two
>     cases. I can only see that a single peer/destination address was
>     selected (not two), which is correct/expected behavior. I cannot see
>     what happened next with sufficient detail.
>
>     Going forward, you have several options, including:
>
>     A. Post a link to compressed ALL,7+ logs to confirm bug(s).
>     B. Fix the broken condition(s) in FwdState. See above.
>
>     HTH,
>
>     Alex.
>
>
>     > 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15
>     > 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection
>     > on local=0.0.0.0:3128 <http://0.0.0.0:3128> <http://0.0.0.0:3128>
>     remote=[::] FD 15 flags=9
>     > 2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest:
>     > HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
>     <http://127.0.0.1:3128>
>     > remote=127.0.0.1:53798 <http://127.0.0.1:53798>
>     <http://127.0.0.1:53798> FD 45 flags=1
>     > 2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest:
>     > HTTP Client REQUEST:
>     > ---------
>     > GET http://HOST:12345/ HTTP/1.1
>     > Host: HOST:12345
>     > User-Agent: curl/7.51.0
>     > Accept: */*
>     > Proxy-Connection: Keep-Alive
>     >
>     >
>     > ----------
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
>     > clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
>     > last ACL checked: localhost
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721)
>     > clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
>     > clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
>     > last ACL checked: localhost
>     > 2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding
>     > client request local=127.0.0.1:3128 <http://127.0.0.1:3128>
>     <http://127.0.0.1:3128>
>     > remote=127.0.0.1:53798 <http://127.0.0.1:53798>
>     <http://127.0.0.1:53798> FD 45 flags=1,
>     > url=http://HOST:12345/
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths:
>     > Find IP destination for: http://HOST:12345/' via 127.0.0.1
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths:
>     > Found sources for 'http://HOST:12345/'
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths: 
>     >  always_direct = DENIED
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths: 
>     >   never_direct = ALLOWED
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths: 
>     >     cache_peer = local=127.0.0.3 remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     > <http://127.0.0.1:18070> flags=1
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths: 
>     >       timedout = 0
>     > 2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server
>     > local=127.0.0.3:57091 <http://127.0.0.3:57091>
>     <http://127.0.0.3:57091> remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     > <http://127.0.0.1:18070> FD 40 flags=1
>     > 2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server
>     > REQUEST:
>     > ---------
>     > GET http://HOST:12345/ HTTP/1.1
>     > User-Agent: curl/7.51.0
>     > Accept: */*
>     > Host: HOST:12345
>     > Cache-Control: max-age=259200
>     > Connection: keep-alive
>     >
>     >
>     > ----------
>     >
>     > [SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]
>     >
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader:
>     > WARNING: HTTP: Invalid Response: No object data received for
>     > http://HOST:12345/ AKA HOST/
>     > 2017/11/27 15:54:20.627| 17,2| FwdState.cc(655)
>     > handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338
>     > http://HOST:12345/
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server
>     > local=127.0.0.3:41355 <http://127.0.0.3:41355>
>     <http://127.0.0.3:41355> remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     > <http://127.0.0.1:18070> FD 40 flags=1
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server
>     > REQUEST:
>     > ---------
>     > GET http://HOST:12345/ HTTP/1.1
>     > User-Agent: curl/7.51.0
>     > Accept: */*
>     > Host: HOST:12345
>     > Cache-Control: max-age=259200
>     > Connection: keep-alive
>     >
>     >
>     > ----------
>     >
>     > [SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]
>     >
>     > 2017/11/27 15:55:00.728| ctx: enter level  0: 'http://HOST:12345/'
>     > 2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP
>     > Server local=127.0.0.3:41355 <http://127.0.0.3:41355>
>     <http://127.0.0.3:41355>
>     > remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     <http://127.0.0.1:18070> FD 40 flags=1
>     > 2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP
>     > Server REPLY:
>     > ---------
>     > HTTP/1.0 502 Bad Gateway
>     > Cache-Control: no-cache
>     > Connection: close
>     > Content-Type: text/html
>     >
>     > <html><body><h1>502 Bad Gateway</h1>
>     > The server returned an invalid or incomplete response.
>     > </body></html>
>     >
>     > ----------
>     > 2017/11/27 15:55:00.728| ctx: exit level  0
>     > 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
>     > StoreEntry::checkCachable: NO: not cachable
>     > 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
>     > StoreEntry::checkCachable: NO: not cachable
>     > 2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073)
>     > processReplyAccessResult: The reply for GET http://HOST:12345/ is
>     > ALLOWED, because it matched (access_log stdio:/var/log/squid/access.log
>     > line)
>     > 2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage:
>     > HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
>     <http://127.0.0.1:3128>
>     > remote=127.0.0.1:53798 <http://127.0.0.1:53798>
>     <http://127.0.0.1:53798> FD 45 flags=1
>     > 2017/11/27 15:55:00.728| 11,2| client_side.cc(1410)
>     sendStartOfMessage:
>     > HTTP Client REPLY:
>     > ---------
>     > HTTP/1.1 502 Bad Gateway
>     > Date: Mon, 27 Nov 2017 23:54:20 GMT
>     > Cache-Control: no-cache
>     > Content-Type: text/html
>     > X-Cache: MISS from ip-172-23-18-130
>     > X-Cache-Lookup: MISS from ip-172-23-18-130:3128
>     > Transfer-Encoding: chunked
>     > Connection: keep-alive
>     >
>     >
>     > ----------
>
>
>
>
> --
> With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: forward_max_tries 1 has no effect

Ivan Larionov
Thanks Alex.

Unfortunately I don't have enough C/C++ skills to fix it.


We've also changed parent behavior so it will not silently close the connection but will return 502 in this exact situation and seems like it fixes unexpected squid re-forward.

On Tue, Nov 28, 2017 at 7:12 PM, Alex Rousskov <[hidden email]> wrote:
On 11/28/2017 02:27 PM, Ivan Larionov wrote:

> Another interesting fact is that I can't reproduce this issue if squid
> has no other traffic except my testing requests. But it's easy to
> reproduce when server has other traffic.

I did not check your logs carefully, but I believe that (when things do
not work the way you expect) your Squid is retrying a failed persistent
connection (rather than re-forwarding the request after receiving a bad
response). See the "pconn race happened" line below:

> 1_retry.log:2017/11/28 12:55:12.731| 17,3| FwdState.cc(416) fail: ERR_ZERO_SIZE_OBJECT "Bad Gateway"
> 1_retry.log:2017/11/28 12:55:12.731| 17,5| FwdState.cc(430) fail: pconn race happened
> 1_retry.log:2017/11/28 12:55:12.731| 93,5| AsyncJob.cc(84) mustStop: HttpStateData will stop, reason: HttpStateData::continueAfterParsingHeader
> 1_retry.log:2017/11/28 12:55:12.731| 17,3| FwdState.cc(618) retryOrBail: re-forwarding (1 tries, 40 secs)
> 1_retry.log:2017/11/28 12:55:12.731| 17,4| FwdState.cc(621) retryOrBail: retrying the same destination


Assuming you tested with forward_max_tries set to 1, the retryOrBail
lines above confirm the off-by-one problem I was describing in my
previous response.

AFAICT, compensating by setting forward_max_tries to zero will _not_
work (for reasons unrelated to the problem at hand).


FWIW, your current options include those outlined at
http://wiki.squid-cache.org/SquidFaq/AboutSquid#How_to_add_a_new_Squid_feature.2C_enhance.2C_of_fix_something.3F


Alex.


> On Tue, Nov 28, 2017 at 7:32 AM, Alex Rousskov wrote:
>
>     On 11/27/2017 05:19 PM, Ivan Larionov wrote:
>
>     > I see retries only when squid config has 2 parents. If I comment out
>     > everything related to "newproxy" I can't reproduce this behavior anymore.
>
>     The posted logs are not detailed enough to confirm or deny that IMO, but
>     I suspect that you are dealing with at least one bug.
>
>
>     > https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F
>     <https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F>
>     >
>     >> Squid does not try to re-forward a request if at least one of the following conditions is true:
>     >>
>     >> The number of forwarding attempts exceeded forward_max_tries. For
>     >> example, if you set forward_max_tries to 1 (one), then no requests
>     >> will be re-forwarded.
>
>
>     AFAICT, there is an off-by-one bug in Squid that violates the above:
>
>     >     if (n_tries > Config.forward_max_tries)
>     >         return false;
>
>     The n_tries counter is incremented before Squid makes a request
>     forwarding attempt. With n_tries and Config.forward_max_tries both set
>     to 1, the quoted FwdState::checkRetry() code will not prevent
>     re-forwarding. There is a similar problem in FwdState::reforward(). This
>     reasoning needs confirmation/testing.
>
>     Please note that simply changing the ">" operator to ">=" may break
>     other things in a difficult-to-detect-by-simple-tests ways. The correct
>     fix may be more complex than it looks and may involve making policy
>     decisions regarding forward_max_tries meaning. The best fix would remove
>     checkRetry() and reforward() duplication. This code is difficult to work
>     with; many related code names are misleading.
>
>
>     >> Squid has no alternative destinations to try. Please note that
>     >> alternative destinations may include multiple next hop IP addresses
>     >> and multiple peers.
>
>     The fact that Squid sends two requests to the same peer with only one
>     peer address selected suggests that Squid is retrying a failed
>     persistent connection rather than re-forwarding after receiving a bad
>     response. Again, the logs are not detailed enough to distinguish the two
>     cases. I can only see that a single peer/destination address was
>     selected (not two), which is correct/expected behavior. I cannot see
>     what happened next with sufficient detail.
>
>     Going forward, you have several options, including:
>
>     A. Post a link to compressed ALL,7+ logs to confirm bug(s).
>     B. Fix the broken condition(s) in FwdState. See above.
>
>     HTH,
>
>     Alex.
>
>
>     > 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15
>     > 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection
>     > on local=0.0.0.0:3128 <http://0.0.0.0:3128> <http://0.0.0.0:3128>
>     remote=[::] FD 15 flags=9
>     > 2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest:
>     > HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
>     <http://127.0.0.1:3128>
>     > remote=127.0.0.1:53798 <http://127.0.0.1:53798>
>     <http://127.0.0.1:53798> FD 45 flags=1
>     > 2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest:
>     > HTTP Client REQUEST:
>     > ---------
>     > GET http://HOST:12345/ HTTP/1.1
>     > Host: HOST:12345
>     > User-Agent: curl/7.51.0
>     > Accept: */*
>     > Proxy-Connection: Keep-Alive
>     >
>     >
>     > ----------
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
>     > clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
>     > last ACL checked: localhost
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721)
>     > clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)
>     > clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED;
>     > last ACL checked: localhost
>     > 2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding
>     > client request local=127.0.0.1:3128 <http://127.0.0.1:3128>
>     <http://127.0.0.1:3128>
>     > remote=127.0.0.1:53798 <http://127.0.0.1:53798>
>     <http://127.0.0.1:53798> FD 45 flags=1,
>     > url=http://HOST:12345/
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths:
>     > Find IP destination for: http://HOST:12345/' via 127.0.0.1
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths:
>     > Found sources for 'http://HOST:12345/'
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths: 
>     >  always_direct = DENIED
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths: 
>     >   never_direct = ALLOWED
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths: 
>     >     cache_peer = local=127.0.0.3 remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     > <http://127.0.0.1:18070> flags=1
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths: 
>     >       timedout = 0
>     > 2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server
>     > local=127.0.0.3:57091 <http://127.0.0.3:57091>
>     <http://127.0.0.3:57091> remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     > <http://127.0.0.1:18070> FD 40 flags=1
>     > 2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server
>     > REQUEST:
>     > ---------
>     > GET http://HOST:12345/ HTTP/1.1
>     > User-Agent: curl/7.51.0
>     > Accept: */*
>     > Host: HOST:12345
>     > Cache-Control: max-age=259200
>     > Connection: keep-alive
>     >
>     >
>     > ----------
>     >
>     > [SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]
>     >
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader:
>     > WARNING: HTTP: Invalid Response: No object data received for
>     > http://HOST:12345/ AKA HOST/
>     > 2017/11/27 15:54:20.627| 17,2| FwdState.cc(655)
>     > handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338
>     > http://HOST:12345/
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server
>     > local=127.0.0.3:41355 <http://127.0.0.3:41355>
>     <http://127.0.0.3:41355> remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     > <http://127.0.0.1:18070> FD 40 flags=1
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server
>     > REQUEST:
>     > ---------
>     > GET http://HOST:12345/ HTTP/1.1
>     > User-Agent: curl/7.51.0
>     > Accept: */*
>     > Host: HOST:12345
>     > Cache-Control: max-age=259200
>     > Connection: keep-alive
>     >
>     >
>     > ----------
>     >
>     > [SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]
>     >
>     > 2017/11/27 15:55:00.728| ctx: enter level  0: 'http://HOST:12345/'
>     > 2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP
>     > Server local=127.0.0.3:41355 <http://127.0.0.3:41355>
>     <http://127.0.0.3:41355>
>     > remote=127.0.0.1:18070 <http://127.0.0.1:18070>
>     <http://127.0.0.1:18070> FD 40 flags=1
>     > 2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP
>     > Server REPLY:
>     > ---------
>     > HTTP/1.0 502 Bad Gateway
>     > Cache-Control: no-cache
>     > Connection: close
>     > Content-Type: text/html
>     >
>     > <html><body><h1>502 Bad Gateway</h1>
>     > The server returned an invalid or incomplete response.
>     > </body></html>
>     >
>     > ----------
>     > 2017/11/27 15:55:00.728| ctx: exit level  0
>     > 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
>     > StoreEntry::checkCachable: NO: not cachable
>     > 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:
>     > StoreEntry::checkCachable: NO: not cachable
>     > 2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073)
>     > processReplyAccessResult: The reply for GET http://HOST:12345/ is
>     > ALLOWED, because it matched (access_log stdio:/var/log/squid/access.log
>     > line)
>     > 2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage:
>     > HTTP Client local=127.0.0.1:3128 <http://127.0.0.1:3128>
>     <http://127.0.0.1:3128>
>     > remote=127.0.0.1:53798 <http://127.0.0.1:53798>
>     <http://127.0.0.1:53798> FD 45 flags=1
>     > 2017/11/27 15:55:00.728| 11,2| client_side.cc(1410)
>     sendStartOfMessage:
>     > HTTP Client REPLY:
>     > ---------
>     > HTTP/1.1 502 Bad Gateway
>     > Date: Mon, 27 Nov 2017 23:54:20 GMT
>     > Cache-Control: no-cache
>     > Content-Type: text/html
>     > X-Cache: MISS from ip-172-23-18-130
>     > X-Cache-Lookup: MISS from ip-172-23-18-130:3128
>     > Transfer-Encoding: chunked
>     > Connection: keep-alive
>     >
>     >
>     > ----------
>
>
>
>
> --
> With best regards, Ivan Larionov.




--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users