High response times with Squid

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

High response times with Squid

Ahmad, Sarfaraz

Hi,

 

I am using Squid 4.5 with WCCP. Intercepting SSL by peeking at step1 and then deciding to either splice or bump upon the SNI.

I am noticing a weird behavior for some of my TCP connections.  Squid is taking over 20s to decide what do with the ClientHello sent by the browser. It is only after 20s that it decides to send out a ClientHello to the origin server and at the same time reply to the client with a ServerHello.

This behavior is hard to reproduce and only some clients are affected.

 

I will try to summarize what I see in cache.log with ALL, 6 debug options.

 

1)      Squid's INTERCEPTION thread/program receives a TCP SYN from workstation.

2019/02/06 17:23:19.070 kid1| 89,5| Intercept.cc(405) Lookup: address BEGIN: me/client= <SQUID_IP>:23129, destination/me= <CLIENT/BROWSER_IP>:58232

 

2)      Squid becomes the origin server and sets up the TCP connection.

2019/02/06 17:23:19.070 kid1| 5,5| AsyncCall.cc(93) ScheduleCall: TcpAcceptor.cc(339) will call httpsAccept(local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33, MXID_1101703) [call34733258]

2019/02/06 17:23:19.070 kid1| 5,5| AsyncCall.cc(38) make: make call httpsAccept [call34733258]

       2019/02/06 17:23:19.070 kid1| 33,4| client_side.cc(2776) httpsAccept: local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33 accepted, starting SSL negotiation.

 

3)      Squid checks the SSL ACLs for the destination IP.

2019/02/06 17:23:19.071 kid1| 28,5| Acl.cc(124) matches: checking (ssl_bump rules)

2019/02/06 17:23:19.071 kid1| 28,5| Checklist.cc(397) bannedAction: Action 'ALLOWED/6' is not banned

2019/02/06 17:23:19.071 kid1| 28,5| Acl.cc(124) matches: checking (ssl_bump rule)

2019/02/06 17:23:19.071 kid1| 28,5| Acl.cc(124) matches: checking no_ssl_bump_src_ip

2019/02/06 17:23:19.071 kid1| 28,3| Ip.cc(538) match: aclIpMatchIp: '<CLIENT/BROWSER_IP>:58232' NOT found

2019/02/06 17:23:19.071 kid1| 28,3| Acl.cc(151) matches: checked: no_ssl_bump_src_ip = 0

 

4)      Squid decides to allow connections to the remote IP i.e <ORIGIN_SERVER_ON_THE_INTERNET> and decides to peek at the SNI (will accept ClientHello), hence fakes a CONNECT request

2019/02/06 17:23:19.071 kid1| 28,3| Checklist.cc(163) checkCallback: ACLChecklist::checkCallback: 0x1fb19fd8 answer=ALLOWED

2019/02/06 17:23:19.071 kid1| 33,2| client_side.cc(2744) httpsSslBumpAccessCheckDone: sslBump action peekneeded for local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33

2019/02/06 17:23:19.071 kid1| 33,2| client_side.cc(3395) fakeAConnectRequest: fake a CONNECT request to force connState to tunnel for ssl-bump

 

5)      The FAKE connect requests again runs through ACLs.  20ms are spent for DNS PTR lookup. A total of 30ms is spent parsing ACLs.

2019/02/06 17:23:19.103 kid1| 85,2| client_side_request.cc(758) clientAccessCheckDone: The request CONNECT <ORIGIN_SERVER_ON_THE_INTERNET> is ALLOWED; last ACL checked: localnet

2019/02/06 17:23:19.103 kid1| 93,4| AccessCheck.cc(145) checkCandidates: NO candidates left

2019/02/06 17:23:19.103 kid1| 93,3| AccessCheck.cc(196) callBack: NULL

2019/02/06 17:23:19.103 kid1| 93,5| AsyncCall.cc(26) AsyncCall: The AsyncCall Adaptation::Initiator::noteAdaptationAclCheckDone constructed, this=0x1552b0b0 [call34733267]

 

6)      FAKE connect is processed and Squid reads the TCP connection, gets the ClientHello and reads the SNI.

2019/02/06 17:23:19.104 kid1| 33,5| AsyncCall.cc(38) make: make call Server::doClientRead [call34733270]

2019/02/06 17:23:19.104 kid1| 33,5| AsyncJob.cc(123) callStart: Http1::Server status in: [ job2971436]

2019/02/06 17:23:19.104 kid1| 33,5| Server.cc(104) doClientRead: local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33

2019/02/06 17:23:19.104 kid1| 5,3| Read.cc(92) ReadNow: local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33, size 4096, retval 203, errno 0

2019/02/06 17:23:19.104 kid1| 33,5| AsyncCall.cc(26) AsyncCall: The AsyncCall ConnStateData::requestTimeout constructed, this=0x3d42d40 [call34733271]

2019/02/06 17:23:19.104 kid1| 5,3| comm.cc(559) commSetConnTimeout: local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33 timeout 300

2019/02/06 17:23:19.104 kid1| 83,5| Handshake.cc(404) parseExtensions: first unsupported extension: 19018

2019/02/06 17:23:19.104 kid1| 83,3| Handshake.cc(497) parseSniExtension: host_name=<ORIGIN_SERVER_SNI>

 

7)      This is followed by another round of ACL processing now that we have the SNI.

 

2019/02/06 17:23:19.106 kid1| 33,3| Pipeline.cc(35) front: Pipeline 0xc2d7b60 front 0x1ae2e730*2

2019/02/06 17:23:19.106 kid1| 33,3| Pipeline.cc(35) front: Pipeline 0xc2d7b60 front 0x1ae2e730*3

2019/02/06 17:23:19.107 kid1| 83,5| Session.cc(103) NewSessionObject: SSL_new session=0x78136a0

2019/02/06 17:23:19.107 kid1| 83,5| bio.cc(616) squid_bio_ctrl: 0xcd0fe80 104(6000, 0x7ffc32a6e6b4)

2019/02/06 17:23:19.107 kid1| 83,5| Session.cc(162) CreateSession: link FD 40 to TLS session=0x78136a0

2019/02/06 17:23:19.107 kid1| 33,5| client_side.cc(2535) httpsCreate: will negotiate TLS on local=<ORIGIN_SERVER_ON_THE_INTERNET>:443 remote=<CLIENT/BROWSER_IP>:58232 FD 40 flags=33

2019/02/06 17:23:19.107 kid1| 5,5| ModEpoll.cc(117) SetSelect: FD 40, type=1, handler=0, client_data=0, timeout=0

2019/02/06 17:23:19.107 kid1| 83,5| client_side.cc(3324) startPeekAndSplice: Peek and splice at step2 done. Start forwarding the request!!!

2019/02/06 17:23:19.107 kid1| 17,3| FwdState.cc(340) Start: '<ORIGIN_SERVER_ON_THE_INTERNET>:443'

 

 

8)      No ServerHello has been sent back to the client yet, Squid starts a TCP connection with the origin server

2019/02/06 17:23:19.110 kid1| 5,4| AsyncJob.cc(123) callStart: Comm::ConnOpener status in: [ job2971439]

2019/02/06 17:23:19.110 kid1| 5,5| ConnOpener.cc(350) doConnect: local=0.0.0.0 remote=<ORIGIN_SERVER_ON_THE_INTERNET>:443 flags=1: Comm::OK - connected

2019/02/06 17:23:19.110 kid1| 5,4| ConnOpener.cc(155) cleanFd: local=0.0.0.0 remote=<ORIGIN_SERVER_ON_THE_INTERNET>:443 flags=1 closing temp FD 50

 

9)      Squid starts a TLS session with the remote/origin server, sends the ClientHello. A total of 0.4 seconds in Squid sending clienthello to origin server. This is probably when Squid decides to send back the ServerHello to the browser.

2019/02/06 17:23:19.110 kid1| 83,5| Session.cc(103) NewSessionObject: SSL_new session=0x14899390

2019/02/06 17:23:19.111 kid1| 83,5| bio.cc(616) squid_bio_ctrl: 0x1492ef80 104(6001, 0x7ffc32a6e884)

2019/02/06 17:23:19.111 kid1| 83,5| Session.cc(162) CreateSession: link FD 50 to TLS session=0x14899390

2019/02/06 17:23:19.111 kid1| 83,5| PeerConnector.cc(123) initialize: local=<SQUID_IP>:44498 remote=<ORIGIN_SERVER_ON_THE_INTERNET>:443 FD 50 flags=1, session=0x14899390

 

So somewhere between Step 8 and Step 9, Squid is taking over 20s.

 

What could possibly be keeping it busy ?

I have external ACL helpers but they work just fine. Average service time is 1ms. Squid has not even spawning all helpers that it has been configured to do. (not exhausted the upper limit).

DNS resolution is also good. All CPU/MEM resources look just fine and again this affects only a subset of the traffic.  I don’t have the failure logs from when this actually happens.

 

 

UPDATE: This problem statement seems local to a few websites. Outside of the proxy, those websites quite quickly as is expected.

 

Any thoughts on where to look ?  other bits and pieces I could check ?  I have jumbo frames enabled (9000 bytes) but am running the proxies at L2 1500 MTU.

 

-Ahmad

 

 

 


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: High response times with Squid

Amos Jeffries
Administrator
On 8/02/19 7:30 pm, Ahmad, Sarfaraz wrote:

> Hi,
>
>  
>
> I am using Squid 4.5 with WCCP. Intercepting SSL by peeking at step1 and
> then deciding to either splice or bump upon the SNI.
>
> I am noticing a weird behavior for some of my TCP connections.  Squid is
> taking over 20s to decide what do with the ClientHello sent by the
> browser. It is only after 20s that it decides to send out a ClientHello
> to the origin server and at the same time reply to the client with a
> ServerHello.
>
> This behavior is hard to reproduce and only some clients are affected.
>
>  
>
> I will try to summarize what I see in cache.log with ALL, 6 debug options.
>
>  
>
> 1)      Squid's INTERCEPTION thread/program receives a TCP SYN from
> workstation.
>
> 2019/02/06 17:23:19.070 kid1| 89,5| Intercept.cc(405) Lookup: address
> BEGIN: me/client= *<SQUID_IP>:*23129, destination/me=
> *<CLIENT/BROWSER_IP>:*58232
>

No. This is looking up the original TCP dst-IP:port in the kernel NAT
tables.


>  
>
> 2)      Squid becomes the origin server and sets up the TCP connection.
>

No. The local= log values are a simple statement of the TCP packet
values received from the NAT system at (1). Squid is an MITM in this
setup, so the client *thinks* it is talking to the origin.

Being an MITM Squid is designed to operate as transparently as possible,
but at no time has the abilities of the origin server.


> 2019/02/06 17:23:19.070 kid1| 5,5| AsyncCall.cc(93) ScheduleCall:
> TcpAcceptor.cc(339) will call
> httpsAccept(local*=<ORIGIN_SERVER_ON_THE_INTERNET>*:443
> remote=*<CLIENT/BROWSER_IP>:*58232 FD 40 flags=33, MXID_1101703)
> [call34733258]
>

...

>
> 8)      No ServerHello has been sent back to the client yet, Squid
> starts a TCP connection with the origin server
>
> 2019/02/06 17:23:19.110 kid1| 5,4| AsyncJob.cc(123) callStart:
> Comm::ConnOpener status in: [ job2971439]
>
> 2019/02/06 17:23:19.110 kid1| 5,5| ConnOpener.cc(350) doConnect:
> local=0.0.0.0 remote*=<ORIGIN_SERVER_ON_THE_INTERNET>:*443 flags=1:
> Comm::OK - connected
>
> 2019/02/06 17:23:19.110 kid1| 5,4| ConnOpener.cc(155) cleanFd:
> local=0.0.0.0 remote=<*ORIGIN_SERVER_ON_THE_INTERNET*>:443 flags=1
> closing temp FD 50
>
>  
>
> 9)      Squid starts a TLS session with the remote/origin server, sends
> the ClientHello. A total of 0.4 seconds in Squid sending clienthello to
> origin server. This is probably when Squid decides to send back the
> ServerHello to the browser.

Don't guess. Check.

Either you have step2 / client-first bumping - in which case the Squid
serverHello would have been sent to the client at (7).

Or, you have step3 / server-first bumping - in which case Squid cannot
send a serverHello to the client until it has received the origin's
serverHello. Which still has not yet been received despite your trace
ending here.

...
>
> 2019/02/06 17:23:19.111 kid1| 83,5| PeerConnector.cc(123) initialize:
> local=*<SQUID_IP>*:44498 remote=*<ORIGIN_SERVER_ON_THE_INTERNET>*:443 FD
> 50 flags=1, session=0x14899390
>
>  
>
> So somewhere between Step 8 and Step 9, Squid is taking over 20s.
>

There is only 1 millisecond between those steps.

The client connection was received at 17:23:19.070, your (9) finished at
17:23:19.111 -> so there is your 0.41 seconds. If there is any 20s gap
for this transaction it is later in the log part you have not shown.


>
> What could possibly be keeping it busy ?
>

Other transactions? Nothing?

What is going on at (9) is *preparing* to send a TLS clientHello. At the
point your log stops it still has not actually been written to the network.

There is actually still a good half of the SSl-Bump process to happen:
 - assemble the Squid clientHello bytes,
 - send that to origin
 - receive origin serverHello
 - validate the origin details
 - HTTP fetch missing certificates (if any)
   - re-validate the origin details (repeat fetch as necessary)
 - formulate the Squid serverHello
 - send that to client
 - receive HTTP request over the secured client connection

... then all the HTTP(S) message processing on the resulting connection.


>
> UPDATE: This problem statement seems local to a few websites. Outside of
> the proxy, those websites quite quickly as is expected.
>
>
> Any thoughts on where to look ?  other bits and pieces I could check ?
>  I have jumbo frames enabled (9000 bytes) but am running the proxies at
> L2 1500 MTU.
>

It is hard to say without also knowing your squid.conf settings and what
sites specifically you are having trouble with,

Could be anything from a packet loss in some remote router halfway
around the world. To misconfiguration of something in your network. To
misconfiguration by the origin server admin.

IMO that last one is most likely if the behaviour is only delay (not
errors) and with only certain domains.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: High response times with Squid

Ahmad, Sarfaraz
Hi again,
I made some progress on this.
To reiterate, I am peeking at the SNI and then bump all connections to the origin server in context of this problem. ( the origin server is seamless.com )

Here are the new findings ,
1) The 20sec lag is noticed even when I splice the connection.
2) It 99% has to do with the following slow ACL acl.

acl deny_explicit_dstdomain dstdomain "/etc/squid/acls/deny_explicit_dstdomain"

I see PTR lookups failing when Squid tries to validate my ACLs. When I disable that ACL, the 20second lag is gone. So I am pretty confident that subsequent PTR lookups are causing the delay here.
I don't see a configuration directive with which I can configure how many times Squid retries the lookup.
I see one that sets the timeout though (dns_timeout  defaults 30 seconds).

Could you guys give me some pointers on what could be happening here ?

Regards,
Ahmad


-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of Amos Jeffries
Sent: Saturday, February 9, 2019 10:20 AM
To: [hidden email]
Subject: Re: [squid-users] High response times with Squid

On 8/02/19 7:30 pm, Ahmad, Sarfaraz wrote:

> Hi,
>
>  
>
> I am using Squid 4.5 with WCCP. Intercepting SSL by peeking at step1
> and then deciding to either splice or bump upon the SNI.
>
> I am noticing a weird behavior for some of my TCP connections.  Squid
> is taking over 20s to decide what do with the ClientHello sent by the
> browser. It is only after 20s that it decides to send out a
> ClientHello to the origin server and at the same time reply to the
> client with a ServerHello.
>
> This behavior is hard to reproduce and only some clients are affected.
>
>  
>
> I will try to summarize what I see in cache.log with ALL, 6 debug options.
>
>  
>
> 1)      Squid's INTERCEPTION thread/program receives a TCP SYN from
> workstation.
>
> 2019/02/06 17:23:19.070 kid1| 89,5| Intercept.cc(405) Lookup: address
> BEGIN: me/client= *<SQUID_IP>:*23129, destination/me=
> *<CLIENT/BROWSER_IP>:*58232
>

No. This is looking up the original TCP dst-IP:port in the kernel NAT tables.


>  
>
> 2)      Squid becomes the origin server and sets up the TCP connection.
>

No. The local= log values are a simple statement of the TCP packet values received from the NAT system at (1). Squid is an MITM in this setup, so the client *thinks* it is talking to the origin.

Being an MITM Squid is designed to operate as transparently as possible, but at no time has the abilities of the origin server.


> 2019/02/06 17:23:19.070 kid1| 5,5| AsyncCall.cc(93) ScheduleCall:
> TcpAcceptor.cc(339) will call
> httpsAccept(local*=<ORIGIN_SERVER_ON_THE_INTERNET>*:443
> remote=*<CLIENT/BROWSER_IP>:*58232 FD 40 flags=33, MXID_1101703)
> [call34733258]
>

...

>
> 8)      No ServerHello has been sent back to the client yet, Squid
> starts a TCP connection with the origin server
>
> 2019/02/06 17:23:19.110 kid1| 5,4| AsyncJob.cc(123) callStart:
> Comm::ConnOpener status in: [ job2971439]
>
> 2019/02/06 17:23:19.110 kid1| 5,5| ConnOpener.cc(350) doConnect:
> local=0.0.0.0 remote*=<ORIGIN_SERVER_ON_THE_INTERNET>:*443 flags=1:
> Comm::OK - connected
>
> 2019/02/06 17:23:19.110 kid1| 5,4| ConnOpener.cc(155) cleanFd:
> local=0.0.0.0 remote=<*ORIGIN_SERVER_ON_THE_INTERNET*>:443 flags=1
> closing temp FD 50
>
>  
>
> 9)      Squid starts a TLS session with the remote/origin server,
> sends the ClientHello. A total of 0.4 seconds in Squid sending
> clienthello to origin server. This is probably when Squid decides to
> send back the ServerHello to the browser.

Don't guess. Check.

Either you have step2 / client-first bumping - in which case the Squid serverHello would have been sent to the client at (7).

Or, you have step3 / server-first bumping - in which case Squid cannot send a serverHello to the client until it has received the origin's serverHello. Which still has not yet been received despite your trace ending here.

...

>
> 2019/02/06 17:23:19.111 kid1| 83,5| PeerConnector.cc(123) initialize:
> local=*<SQUID_IP>*:44498 remote=*<ORIGIN_SERVER_ON_THE_INTERNET>*:443
> FD
> 50 flags=1, session=0x14899390
>
>  
>
> So somewhere between Step 8 and Step 9, Squid is taking over 20s.
>

There is only 1 millisecond between those steps.

The client connection was received at 17:23:19.070, your (9) finished at
17:23:19.111 -> so there is your 0.41 seconds. If there is any 20s gap for this transaction it is later in the log part you have not shown.


>
> What could possibly be keeping it busy ?
>

Other transactions? Nothing?

What is going on at (9) is *preparing* to send a TLS clientHello. At the point your log stops it still has not actually been written to the network.

There is actually still a good half of the SSl-Bump process to happen:
 - assemble the Squid clientHello bytes,
 - send that to origin
 - receive origin serverHello
 - validate the origin details
 - HTTP fetch missing certificates (if any)
   - re-validate the origin details (repeat fetch as necessary)
 - formulate the Squid serverHello
 - send that to client
 - receive HTTP request over the secured client connection

... then all the HTTP(S) message processing on the resulting connection.


>
> UPDATE: This problem statement seems local to a few websites. Outside
> of the proxy, those websites quite quickly as is expected.
>
>
> Any thoughts on where to look ?  other bits and pieces I could check ?
>  I have jumbo frames enabled (9000 bytes) but am running the proxies
> at
> L2 1500 MTU.
>

It is hard to say without also knowing your squid.conf settings and what sites specifically you are having trouble with,

Could be anything from a packet loss in some remote router halfway around the world. To misconfiguration of something in your network. To misconfiguration by the origin server admin.

IMO that last one is most likely if the behaviour is only delay (not
errors) and with only certain domains.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: High response times with Squid

Amos Jeffries
Administrator
On 14/02/19 11:38 pm, Ahmad, Sarfaraz wrote:

> Hi again,
> I made some progress on this.
> To reiterate, I am peeking at the SNI and then bump all connections to the origin server in context of this problem. ( the origin server is seamless.com )
>
> Here are the new findings ,
> 1) The 20sec lag is noticed even when I splice the connection.
> 2) It 99% has to do with the following slow ACL acl.
>
> acl deny_explicit_dstdomain dstdomain "/etc/squid/acls/deny_explicit_dstdomain"
>
> I see PTR lookups failing when Squid tries to validate my ACLs. When I disable that ACL, the 20second lag is gone. So I am pretty confident that subsequent PTR lookups are causing the delay here.
> I don't see a configuration directive with which I can configure how many times Squid retries the lookup.
> I see one that sets the timeout though (dns_timeout  defaults 30 seconds).
>
> Could you guys give me some pointers on what could be happening here ?

Only repeat back to you what you have described to us ... DNS PTR
lookups are slow.

Your squid.conf is needed to know where those lookups are happening and
see if any can be avoided.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: High response times with Squid

Ahmad, Sarfaraz
Thanks for all the pointers :) I figured it out. Seamless.com's PTR lookups are slow and end up in SERVFAIL.
And that was causing the delay here. I purged that ACL and it's all good.


-----Original Message-----
From: Amos Jeffries <[hidden email]>
Sent: Friday, February 15, 2019 9:24 AM
To: Ahmad, Sarfaraz <[hidden email]>; [hidden email]
Subject: Re: [squid-users] High response times with Squid

On 14/02/19 11:38 pm, Ahmad, Sarfaraz wrote:

> Hi again,
> I made some progress on this.
> To reiterate, I am peeking at the SNI and then bump all connections to
> the origin server in context of this problem. ( the origin server is
> seamless.com )
>
> Here are the new findings ,
> 1) The 20sec lag is noticed even when I splice the connection.
> 2) It 99% has to do with the following slow ACL acl.
>
> acl deny_explicit_dstdomain dstdomain "/etc/squid/acls/deny_explicit_dstdomain"
>
> I see PTR lookups failing when Squid tries to validate my ACLs. When I disable that ACL, the 20second lag is gone. So I am pretty confident that subsequent PTR lookups are causing the delay here.
> I don't see a configuration directive with which I can configure how many times Squid retries the lookup.
> I see one that sets the timeout though (dns_timeout  defaults 30 seconds).
>
> Could you guys give me some pointers on what could be happening here ?

Only repeat back to you what you have described to us ... DNS PTR lookups are slow.

Your squid.conf is needed to know where those lookups are happening and see if any can be avoided.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: High response times with Squid

Eliezer Croitoru
In reply to this post by Ahmad, Sarfaraz
You can replace them with dstdom_regex which will not trigger a PTR lookup.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of Ahmad, Sarfaraz
Sent: Thursday, February 14, 2019 12:39
To: Amos Jeffries <[hidden email]>; [hidden email]
Subject: Re: [squid-users] High response times with Squid

Hi again,
I made some progress on this.
To reiterate, I am peeking at the SNI and then bump all connections to the origin server in context of this problem. ( the origin server is seamless.com )

Here are the new findings ,
1) The 20sec lag is noticed even when I splice the connection.
2) It 99% has to do with the following slow ACL acl.

acl deny_explicit_dstdomain dstdomain "/etc/squid/acls/deny_explicit_dstdomain"

I see PTR lookups failing when Squid tries to validate my ACLs. When I disable that ACL, the 20second lag is gone. So I am pretty confident that subsequent PTR lookups are causing the delay here.
I don't see a configuration directive with which I can configure how many times Squid retries the lookup.
I see one that sets the timeout though (dns_timeout  defaults 30 seconds).

Could you guys give me some pointers on what could be happening here ?

Regards,
Ahmad


-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of Amos Jeffries
Sent: Saturday, February 9, 2019 10:20 AM
To: [hidden email]
Subject: Re: [squid-users] High response times with Squid

On 8/02/19 7:30 pm, Ahmad, Sarfaraz wrote:

> Hi,
>
>  
>
> I am using Squid 4.5 with WCCP. Intercepting SSL by peeking at step1
> and then deciding to either splice or bump upon the SNI.
>
> I am noticing a weird behavior for some of my TCP connections.  Squid
> is taking over 20s to decide what do with the ClientHello sent by the
> browser. It is only after 20s that it decides to send out a
> ClientHello to the origin server and at the same time reply to the
> client with a ServerHello.
>
> This behavior is hard to reproduce and only some clients are affected.
>
>  
>
> I will try to summarize what I see in cache.log with ALL, 6 debug options.
>
>  
>
> 1)      Squid's INTERCEPTION thread/program receives a TCP SYN from
> workstation.
>
> 2019/02/06 17:23:19.070 kid1| 89,5| Intercept.cc(405) Lookup: address
> BEGIN: me/client= *<SQUID_IP>:*23129, destination/me=
> *<CLIENT/BROWSER_IP>:*58232
>

No. This is looking up the original TCP dst-IP:port in the kernel NAT tables.


>  
>
> 2)      Squid becomes the origin server and sets up the TCP connection.
>

No. The local= log values are a simple statement of the TCP packet values received from the NAT system at (1). Squid is an MITM in this setup, so the client *thinks* it is talking to the origin.

Being an MITM Squid is designed to operate as transparently as possible, but at no time has the abilities of the origin server.


> 2019/02/06 17:23:19.070 kid1| 5,5| AsyncCall.cc(93) ScheduleCall:
> TcpAcceptor.cc(339) will call
> httpsAccept(local*=<ORIGIN_SERVER_ON_THE_INTERNET>*:443
> remote=*<CLIENT/BROWSER_IP>:*58232 FD 40 flags=33, MXID_1101703)
> [call34733258]
>

...

>
> 8)      No ServerHello has been sent back to the client yet, Squid
> starts a TCP connection with the origin server
>
> 2019/02/06 17:23:19.110 kid1| 5,4| AsyncJob.cc(123) callStart:
> Comm::ConnOpener status in: [ job2971439]
>
> 2019/02/06 17:23:19.110 kid1| 5,5| ConnOpener.cc(350) doConnect:
> local=0.0.0.0 remote*=<ORIGIN_SERVER_ON_THE_INTERNET>:*443 flags=1:
> Comm::OK - connected
>
> 2019/02/06 17:23:19.110 kid1| 5,4| ConnOpener.cc(155) cleanFd:
> local=0.0.0.0 remote=<*ORIGIN_SERVER_ON_THE_INTERNET*>:443 flags=1
> closing temp FD 50
>
>  
>
> 9)      Squid starts a TLS session with the remote/origin server,
> sends the ClientHello. A total of 0.4 seconds in Squid sending
> clienthello to origin server. This is probably when Squid decides to
> send back the ServerHello to the browser.

Don't guess. Check.

Either you have step2 / client-first bumping - in which case the Squid serverHello would have been sent to the client at (7).

Or, you have step3 / server-first bumping - in which case Squid cannot send a serverHello to the client until it has received the origin's serverHello. Which still has not yet been received despite your trace ending here.

...

>
> 2019/02/06 17:23:19.111 kid1| 83,5| PeerConnector.cc(123) initialize:
> local=*<SQUID_IP>*:44498 remote=*<ORIGIN_SERVER_ON_THE_INTERNET>*:443
> FD
> 50 flags=1, session=0x14899390
>
>  
>
> So somewhere between Step 8 and Step 9, Squid is taking over 20s.
>

There is only 1 millisecond between those steps.

The client connection was received at 17:23:19.070, your (9) finished at
17:23:19.111 -> so there is your 0.41 seconds. If there is any 20s gap for this transaction it is later in the log part you have not shown.


>
> What could possibly be keeping it busy ?
>

Other transactions? Nothing?

What is going on at (9) is *preparing* to send a TLS clientHello. At the point your log stops it still has not actually been written to the network.

There is actually still a good half of the SSl-Bump process to happen:
 - assemble the Squid clientHello bytes,
 - send that to origin
 - receive origin serverHello
 - validate the origin details
 - HTTP fetch missing certificates (if any)
   - re-validate the origin details (repeat fetch as necessary)
 - formulate the Squid serverHello
 - send that to client
 - receive HTTP request over the secured client connection

... then all the HTTP(S) message processing on the resulting connection.


>
> UPDATE: This problem statement seems local to a few websites. Outside
> of the proxy, those websites quite quickly as is expected.
>
>
> Any thoughts on where to look ?  other bits and pieces I could check ?
>  I have jumbo frames enabled (9000 bytes) but am running the proxies
> at
> L2 1500 MTU.
>

It is hard to say without also knowing your squid.conf settings and what sites specifically you are having trouble with,

Could be anything from a packet loss in some remote router halfway around the world. To misconfiguration of something in your network. To misconfiguration by the origin server admin.

IMO that last one is most likely if the behaviour is only delay (not
errors) and with only certain domains.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users