Huge amount of time_wait connections after upgrade from v2 to v3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Huge amount of time_wait connections after upgrade from v2 to v3

Ivan Larionov
Hi!

We recently updated from squid v2 to v3 and now see huge increase in connections in TIME_WAIT state on our squid servers (verified that this is clients connections).

See versions and amount of such connections under the same load with the same configs (except some incompatible stuff):

squid 2.7.STABLE9

configure options:  '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--exec_prefix=/usr' '--bindir=/usr/sbin' '--libexecdir=/usr/lib/squid' '--localstatedir=/var' '--datadir=/usr/share' '--sysconfdir=/etc/squid' '--enable-epoll' '--enable-removal-policies=heap,lru' '--enable-storeio=aufs' '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests' '--enable-useragent-log' '--enable-referer-log' '--with-large-files' '--with-maxfd=16384' '--enable-err-languages=English'

# netstat -tn | grep TIME_WAIT | grep 3128 | wc -l
95

squid 3.5.25

configure options:  '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/sbin' '--sbindir=/usr/sbin' '--sysconfdir=/etc/squid' '--libdir=/usr/lib' '--libexecdir=/usr/lib/squid' '--includedir=/usr/include' '--datadir=/usr/share' '--sharedstatedir=/usr/com' '--localstatedir=/var' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-epoll' '--enable-removal-policies=heap,lru' '--enable-storeio=aufs' '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests' '--enable-useragent-log' '--enable-referer-log' '--with-large-files' '--with-maxfd=16384' '--enable-err-languages=English' '--enable-htcp'

# netstat -tn | grep TIME_WAIT | grep 3128 | wc -l
11277

Config:

http_port 0.0.0.0:3128

acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443

acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl Safe_ports port 1025-65535  # unregistered ports

acl CONNECT method CONNECT

### START CUSTOM
acl Purge_method method PURGE

# Allow localhost to selectively flush the cache
http_access allow localhost Purge_method
http_access deny Purge_method
### END CUSTOM

### ALLOW ACCESS TO ALL PORTS
# http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager

http_access allow localnet
http_access allow localhost
http_access deny all

### START CUSTOM
# Disable icp
icp_port 0
# Allow ICP queries from local networks only
icp_access allow localnet
icp_access allow localhost
icp_access deny all

# Disable htcp
htcp_port 0
# Allow HTCP queries from local networks only
htcp_access allow localnet
htcp_access allow localhost
htcp_access deny all

# Check for custom request header
acl custom_acl req_header x-use-custom-proxy -i true
# Check for x-use-new-proxy request header
acl custom_new_acl req_header x-use-new-proxy -i true

# first_proxy
cache_peer 127.0.0.1 parent 18070 0 no-query no-digest name=first_proxy
cache_peer_access first_proxy deny custom_acl
cache_peer_access first_proxy deny custom_new_acl

# second_proxy
cache_peer 127.0.0.1 parent 18079 0 no-query no-digest name=second_proxy
cache_peer_access second_proxy allow custom_acl
cache_peer_access second_proxy allow custom_new_acl
cache_peer_access second_proxy deny all

never_direct allow all

cache_mem 4620591 KB
maximum_object_size_in_memory 8 KB
memory_replacement_policy heap LRU
cache_replacement_policy heap LRU

cache_dir aufs /mnt/services/squid/cache 891289 16 256

minimum_object_size 64 bytes # none-zero so we dont cache mistakes
maximum_object_size 102400 KB

logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %tr "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %un %Sh/%<A %mt

access_log stdio:/var/log/squid/access.log combined
cache_log /var/log/squid/cache.log
cache_store_log none
logfile_rotate 0

client_db off

pid_filename /var/run/squid.pid


coredump_dir /var/cache
### END CUSTOM

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
# refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

### START CUSTOM
# don't cache errors
negative_ttl 0 minutes
# always fetch object from the beginning regardless of Range requests
range_offset_limit none
cache_effective_user squid
cache_effective_group squid
max_filedescriptors 524288
via off
forwarded_for delete
### END CUSTOM

We tried "half_closed_clients on" but it didn't help.

Any ideas?

Thanks.

--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Dieter Bloms-2
Hi Ivan,

On Tue, Jun 06, Ivan Larionov wrote:

> We recently updated from squid v2 to v3 and now see huge increase in
> connections in TIME_WAIT state on our squid servers (verified that this is
> clients connections).

I can confirm that since 3.5.22 to our ICAP scanners.
with 3.5.21 we had no problems on SLES11 SP4 operating system.
We did some tests with RHEL7 and we had much less TIME_WAIT.
Do you use an older operation system ?


--
Regards

  Dieter

--
I do not get viruses because I do not use MS software.
If you use Outlook then please do not put my email address in your
address-book so that WHEN you get a virus it won't use my address in the
From field.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Amos Jeffries
Administrator
In reply to this post by Ivan Larionov
On 07/06/17 12:13, Ivan Larionov wrote:
> Hi!
>
> We recently updated from squid v2 to v3 and now see huge increase in
> connections in TIME_WAIT state on our squid servers (verified that
> this is clients connections).

The biggest change between 2.7 and 3.5 in this area is that 2.7 was
HTTP/1.0 which closed TCP connections after each request by default, and
3.5 is HTTP/1.1 which does not. So connections are more likely to
persist until they hit some TCP timeout then enter the slow TIME_WAIT
process.

There were also some other bugs identified in older 3.5 releases which
increased the TIME_WAIT specifically. I thought those were almost all
fixed by now, but YMMV whether you hit the remaining issues.
  A workaround it to set
<http://www.squid-cache.org/Doc/config/client_idle_pconn_timeout/> to a
shorter value than the default  2min. eg you might want it to be 30sec
or so.



>
> See versions and amount of such connections under the same load with
> the same configs (except some incompatible stuff):
>
> squid 2.7.STABLE9
>
> configure options:  '--program-prefix=' '--prefix=/usr'
> '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin'
> '--sysconfdir=/etc' '--includedir=/usr/include' '--libdir=/usr/lib'
> '--libexecdir=/usr/libexec' '--sharedstatedir=/usr/com'
> '--mandir=/usr/share/man' '--infodir=/usr/share/info'
> '--exec_prefix=/usr' '--bindir=/usr/sbin'
> '--libexecdir=/usr/lib/squid' '--localstatedir=/var'
> '--datadir=/usr/share' '--sysconfdir=/etc/squid' '--enable-epoll'
> '--enable-removal-policies=heap,lru' '--enable-storeio=aufs'
> '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests'
> '--enable-useragent-log' '--enable-referer-log' '--with-large-files'
> '--with-maxfd=16384' '--enable-err-languages=English'
>
> # netstat -tn | grep TIME_WAIT | grep 3128 | wc -l
> 95
>
> squid 3.5.25
>
> configure options:  '--program-prefix=' '--prefix=/usr'
> '--exec-prefix=/usr' '--bindir=/usr/sbin' '--sbindir=/usr/sbin'
> '--sysconfdir=/etc/squid' '--libdir=/usr/lib'
> '--libexecdir=/usr/lib/squid' '--includedir=/usr/include'
> '--datadir=/usr/share' '--sharedstatedir=/usr/com'
> '--localstatedir=/var' '--mandir=/usr/share/man'
> '--infodir=/usr/share/info' '--enable-epoll'
> '--enable-removal-policies=heap,lru' '--enable-storeio=aufs'
> '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests'
> '--enable-useragent-log' '--enable-referer-log' '--with-large-files'
> '--with-maxfd=16384' '--enable-err-languages=English' '--enable-htcp'

FYI, these options are not doing anything for Squid-3:
   '--enable-useragent-log' '--enable-referer-log'
'--enable-err-languages=English'


>
> # netstat -tn | grep TIME_WAIT | grep 3128 | wc -l
> 11277
>
> Config:
>
> http_port 0.0.0.0:3128 <http://0.0.0.0:3128>
>
> acl localnet src 10.0.0.0/8 <http://10.0.0.0/8>     # RFC1918 possible
> internal network
> acl localnet src 172.16.0.0/12 <http://172.16.0.0/12>  # RFC1918
> possible internal network
> acl localnet src 192.168.0.0/16 <http://192.168.0.0/16> # RFC1918
> possible internal network
> acl localnet src fc00::/7       # RFC 4193 local private network range
> acl localnet src fe80::/10      # RFC 4291 link-local (directly
> plugged) machines
>
> acl SSL_ports port 443
>
> acl Safe_ports port 80          # http
> acl Safe_ports port 21          # ftp
> acl Safe_ports port 443         # https
> acl Safe_ports port 70          # gopher
> acl Safe_ports port 210         # wais
> acl Safe_ports port 280         # http-mgmt
> acl Safe_ports port 488         # gss-http
> acl Safe_ports port 591         # filemaker
> acl Safe_ports port 777         # multiling http
> acl Safe_ports port 1025-65535  # unregistered ports
>
> acl CONNECT method CONNECT
>
> ### START CUSTOM
> acl Purge_method method PURGE
>
> # Allow localhost to selectively flush the cache
> http_access allow localhost Purge_method
> http_access deny Purge_method
> ### END CUSTOM
>
> ### ALLOW ACCESS TO ALL PORTS
> # http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access allow localhost manager
> http_access deny manager
>
> http_access allow localnet
> http_access allow localhost
> http_access deny all
>
> ### START CUSTOM
> # Disable icp
> icp_port 0
> # Allow ICP queries from local networks only
> icp_access allow localnet
> icp_access allow localhost
> icp_access deny all
>
> # Disable htcp
> htcp_port 0
> # Allow HTCP queries from local networks only
> htcp_access allow localnet
> htcp_access allow localhost
> htcp_access deny all

FYI: setting icp_access and htcp_access is pointless when the relevant
port is 0. That port 0 disables the entire component.

>
> # Check for custom request header
> acl custom_acl req_header x-use-custom-proxy -i true
> # Check for x-use-new-proxy request header
> acl custom_new_acl req_header x-use-new-proxy -i true
>
> # first_proxy
> cache_peer 127.0.0.1 parent 18070 0 no-query no-digest name=first_proxy
> cache_peer_access first_proxy deny custom_acl
> cache_peer_access first_proxy deny custom_new_acl
>
> # second_proxy
> cache_peer 127.0.0.1 parent 18079 0 no-query no-digest name=second_proxy
> cache_peer_access second_proxy allow custom_acl
> cache_peer_access second_proxy allow custom_new_acl
> cache_peer_access second_proxy deny all
>
> never_direct allow all
>
> cache_mem 4620591 KB
> maximum_object_size_in_memory 8 KB
> memory_replacement_policy heap LRU
> cache_replacement_policy heap LRU
>
> cache_dir aufs /mnt/services/squid/cache 891289 16 256
>
> minimum_object_size 64 bytes # none-zero so we dont cache mistakes
> maximum_object_size 102400 KB
>
> logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %tr
> "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
> logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %un %Sh/%<A %mt

Please do not re-define these formats. If you want to use the default
format they are defined internally by Squid3, if you want any
customizations use a different format name.

>
> access_log stdio:/var/log/squid/access.log combined
> cache_log /var/log/squid/cache.log
> cache_store_log none
> logfile_rotate 0
>
> client_db off
>
> pid_filename /var/run/squid.pid
>
>
> coredump_dir /var/cache
> ### END CUSTOM
>
> refresh_pattern ^ftp:           1440    20%     10080
> refresh_pattern ^gopher:        1440    0%      1440
> # refresh_pattern -i (/cgi-bin/|\?) 0     0%      0

Please do not remove that cgi-bin pattern. It is there to protect the
cache against servers with broken/ancient CGI engines. It is designed
explicitly so modern dynamic sites that provide proper cacheability
headers can still be stored. So no harm and only benefits from in
leaving it there.


Amos

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Ivan Larionov
Hi. Sorry that I'm answering to the old thread. I was on vacation and didn't have a chance to test the proposed solution.

Dieter, yes, I'm on the old CentOS 6 based OS (Amazon Linux) but with a new kernel 4.9.27.

Amos, thank you for the suggestions about configure flags and squid config options, I fixed all issues you pointed to.

Unfortunately following workarounds didn't help:

* client_idle_pconn_timeout 30 seconds
* half_closed_clients on
* client_persistent_connections off
* server_persistent_connections off

However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:

* All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
* 3.5.20 is the latest stable version.
* 3.5.21 is the first broken version.
* 3.5.23, 3.5.25, 3.5.26 are broken as well.

This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.

I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.

Thank you.

On Wed, Jun 7, 2017 at 4:34 AM, Amos Jeffries <[hidden email]> wrote:
On 07/06/17 12:13, Ivan Larionov wrote:
Hi!

We recently updated from squid v2 to v3 and now see huge increase in connections in TIME_WAIT state on our squid servers (verified that this is clients connections).

The biggest change between 2.7 and 3.5 in this area is that 2.7 was HTTP/1.0 which closed TCP connections after each request by default, and 3.5 is HTTP/1.1 which does not. So connections are more likely to persist until they hit some TCP timeout then enter the slow TIME_WAIT process.

There were also some other bugs identified in older 3.5 releases which increased the TIME_WAIT specifically. I thought those were almost all fixed by now, but YMMV whether you hit the remaining issues.
 A workaround it to set <http://www.squid-cache.org/Doc/config/client_idle_pconn_timeout/> to a shorter value than the default  2min. eg you might want it to be 30sec or so.




See versions and amount of such connections under the same load with the same configs (except some incompatible stuff):

squid 2.7.STABLE9

configure options:  '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--exec_prefix=/usr' '--bindir=/usr/sbin' '--libexecdir=/usr/lib/squid' '--localstatedir=/var' '--datadir=/usr/share' '--sysconfdir=/etc/squid' '--enable-epoll' '--enable-removal-policies=heap,lru' '--enable-storeio=aufs' '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests' '--enable-useragent-log' '--enable-referer-log' '--with-large-files' '--with-maxfd=16384' '--enable-err-languages=English'

# netstat -tn | grep TIME_WAIT | grep 3128 | wc -l
95

squid 3.5.25

configure options:  '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/sbin' '--sbindir=/usr/sbin' '--sysconfdir=/etc/squid' '--libdir=/usr/lib' '--libexecdir=/usr/lib/squid' '--includedir=/usr/include' '--datadir=/usr/share' '--sharedstatedir=/usr/com' '--localstatedir=/var' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-epoll' '--enable-removal-policies=heap,lru' '--enable-storeio=aufs' '--enable-delay-pools' '--with-pthreads' '--enable-cache-digests' '--enable-useragent-log' '--enable-referer-log' '--with-large-files' '--with-maxfd=16384' '--enable-err-languages=English' '--enable-htcp'

FYI, these options are not doing anything for Squid-3:
  '--enable-useragent-log' '--enable-referer-log' '--enable-err-languages=English'



# netstat -tn | grep TIME_WAIT | grep 3128 | wc -l
11277

Config:

http_port 0.0.0.0:3128 <http://0.0.0.0:3128>

acl localnet src 10.0.0.0/8 <http://10.0.0.0/8>     # RFC1918 possible internal network
acl localnet src 172.16.0.0/12 <http://172.16.0.0/12>  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 <http://192.168.0.0/16> # RFC1918 possible internal network

acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443

acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl Safe_ports port 1025-65535  # unregistered ports

acl CONNECT method CONNECT

### START CUSTOM
acl Purge_method method PURGE

# Allow localhost to selectively flush the cache
http_access allow localhost Purge_method
http_access deny Purge_method
### END CUSTOM

### ALLOW ACCESS TO ALL PORTS
# http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager

http_access allow localnet
http_access allow localhost
http_access deny all

### START CUSTOM
# Disable icp
icp_port 0
# Allow ICP queries from local networks only
icp_access allow localnet
icp_access allow localhost
icp_access deny all

# Disable htcp
htcp_port 0
# Allow HTCP queries from local networks only
htcp_access allow localnet
htcp_access allow localhost
htcp_access deny all

FYI: setting icp_access and htcp_access is pointless when the relevant port is 0. That port 0 disables the entire component.


# Check for custom request header
acl custom_acl req_header x-use-custom-proxy -i true
# Check for x-use-new-proxy request header
acl custom_new_acl req_header x-use-new-proxy -i true

# first_proxy
cache_peer 127.0.0.1 parent 18070 0 no-query no-digest name=first_proxy
cache_peer_access first_proxy deny custom_acl
cache_peer_access first_proxy deny custom_new_acl

# second_proxy
cache_peer 127.0.0.1 parent 18079 0 no-query no-digest name=second_proxy
cache_peer_access second_proxy allow custom_acl
cache_peer_access second_proxy allow custom_new_acl
cache_peer_access second_proxy deny all

never_direct allow all

cache_mem 4620591 KB
maximum_object_size_in_memory 8 KB
memory_replacement_policy heap LRU
cache_replacement_policy heap LRU

cache_dir aufs /mnt/services/squid/cache 891289 16 256

minimum_object_size 64 bytes # none-zero so we dont cache mistakes
maximum_object_size 102400 KB

logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %tr "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %un %Sh/%<A %mt

Please do not re-define these formats. If you want to use the default format they are defined internally by Squid3, if you want any customizations use a different format name.


access_log stdio:/var/log/squid/access.log combined
cache_log /var/log/squid/cache.log
cache_store_log none
logfile_rotate 0

client_db off

pid_filename /var/run/squid.pid


coredump_dir /var/cache
### END CUSTOM

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
# refresh_pattern -i (/cgi-bin/|\?) 0     0%      0

Please do not remove that cgi-bin pattern. It is there to protect the cache against servers with broken/ancient CGI engines. It is designed explicitly so modern dynamic sites that provide proper cacheability headers can still be stored. So no harm and only benefits from in leaving it there.


Amos


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users



--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Amos Jeffries
Administrator
On 07/07/17 13:55, Ivan Larionov wrote:

> Hi. Sorry that I'm answering to the old thread. I was on vacation and
> didn't have a chance to test the proposed solution.
>
> Dieter, yes, I'm on the old CentOS 6 based OS (Amazon Linux) but with a
> new kernel 4.9.27.
>
> Amos, thank you for the suggestions about configure flags and squid
> config options, I fixed all issues you pointed to.
>
> Unfortunately following workarounds didn't help:
>
> * client_idle_pconn_timeout 30 seconds
> * half_closed_clients on
> * client_persistent_connections off
> * server_persistent_connections off
>

TIME_WAIT is a sign that Squid is following the normal TCP process for
closing connections, and doing so before the remote endpoint closes.

Disabling persistent connections increases the number of connections
going through that process. So you definitely want those settings ON to
reduce the WAIT states.

If the remote end is the one doing the closure, then you will see less
TIME_WAIT, but CLOSE_WAIT will increase instead. The trick is in finding
the right balance of timeouts on both client and server idle pconn to
get the minimum of total WAIT states. That is network dependent.

Generally though forward/explicit and intercept proxies want
client_idle_pconn_timeout to be shorter than server_idle_pconn_timeout.
Reverse proxy want the opposite.



> However I assumed that this is a bug and that I can find older version
> which worked fine. I started testing from 3.1.x all the way to 3.5.26
> and this is what I found:
>
> * All versions until 3.5.21 work fine. There no issues with huge amount
> of TIME_WAIT connections under load.
> * 3.5.20 is the latest stable version.
> * 3.5.21 is the first broken version.
> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>
> This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
>
> I hope this helps and I hope you'll be able to find an issue. If you can
> create a bug report based on this information and post it here it would
> be awesome.

The changes in 3.5.21 were fixes to some common crashes and better
caching behaviour. So I expect at least some of the change is due to
higher traffic throughput on proxies previously restricted by those
problems.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Ivan Larionov
Thank you for the fast reply.

> On Jul 7, 2017, at 01:10, Amos Jeffries <[hidden email]> wrote:
>
>> On 07/07/17 13:55, Ivan Larionov wrote:
>> Hi. Sorry that I'm answering to the old thread. I was on vacation and didn't have a chance to test the proposed solution.
>> Dieter, yes, I'm on the old CentOS 6 based OS (Amazon Linux) but with a new kernel 4.9.27.
>> Amos, thank you for the suggestions about configure flags and squid config options, I fixed all issues you pointed to.
>> Unfortunately following workarounds didn't help:
>> * client_idle_pconn_timeout 30 seconds
>> * half_closed_clients on
>> * client_persistent_connections off
>> * server_persistent_connections off
>
> TIME_WAIT is a sign that Squid is following the normal TCP process for closing connections, and doing so before the remote endpoint closes.
>
> Disabling persistent connections increases the number of connections going through that process. So you definitely want those settings ON to reduce the WAIT states.
>

I understand that. I just wrote that I tried this options and they had no effect. They didn't increase nor decrease number of TIME_WAIT connections. I removed them when I started testing older versions.

> If the remote end is the one doing the closure, then you will see less TIME_WAIT, but CLOSE_WAIT will increase instead. The trick is in finding the right balance of timeouts on both client and server idle pconn to get the minimum of total WAIT states. That is network dependent.
>
> Generally though forward/explicit and intercept proxies want client_idle_pconn_timeout to be shorter than server_idle_pconn_timeout. Reverse proxy want the opposite.
>
>
>
>> However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
>> * All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
>> * 3.5.20 is the latest stable version.
>> * 3.5.21 is the first broken version.
>> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>> This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
>> I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.
>
> The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.
>

I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.

In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.

When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.

I know that time_wait is fine in general. Until you have too many of them.

> Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Eliezer Croitoru
Hey Ivan,

How do you run these tests?
With what application "ab" ?

Thanks,
Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]



-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Ivan Larionov
Sent: Friday, July 7, 2017 17:07
To: Amos Jeffries <[hidden email]>
Cc: [hidden email]
Subject: Re: [squid-users] Huge amount of time_wait connections after upgrade from v2 to v3

Thank you for the fast reply.

> On Jul 7, 2017, at 01:10, Amos Jeffries <[hidden email]> wrote:
>
>> On 07/07/17 13:55, Ivan Larionov wrote:
>> Hi. Sorry that I'm answering to the old thread. I was on vacation and didn't have a chance to test the proposed solution.
>> Dieter, yes, I'm on the old CentOS 6 based OS (Amazon Linux) but with a new kernel 4.9.27.
>> Amos, thank you for the suggestions about configure flags and squid config options, I fixed all issues you pointed to.
>> Unfortunately following workarounds didn't help:
>> * client_idle_pconn_timeout 30 seconds
>> * half_closed_clients on
>> * client_persistent_connections off
>> * server_persistent_connections off
>
> TIME_WAIT is a sign that Squid is following the normal TCP process for closing connections, and doing so before the remote endpoint closes.
>
> Disabling persistent connections increases the number of connections going through that process. So you definitely want those settings ON to reduce the WAIT states.
>

I understand that. I just wrote that I tried this options and they had no effect. They didn't increase nor decrease number of TIME_WAIT connections. I removed them when I started testing older versions.

> If the remote end is the one doing the closure, then you will see less TIME_WAIT, but CLOSE_WAIT will increase instead. The trick is in finding the right balance of timeouts on both client and server idle pconn to get the minimum of total WAIT states. That is network dependent.
>
> Generally though forward/explicit and intercept proxies want client_idle_pconn_timeout to be shorter than server_idle_pconn_timeout. Reverse proxy want the opposite.
>
>
>
>> However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
>> * All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
>> * 3.5.20 is the latest stable version.
>> * 3.5.21 is the first broken version.
>> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>> This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
>> I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.
>
> The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.
>

I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.

In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.

When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.

I know that time_wait is fine in general. Until you have too many of them.

> Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Ivan Larionov

> On Jul 7, 2017, at 07:20, Eliezer Croitoru <[hidden email]> wrote:
>
> Hey Ivan,
>
> How do you run these tests?
> With what application "ab" ?
>

Apache Jmeter with test case written by our load test engineer. I'm not at work right now so can't say the exact scenario but afaik we were trying to reproduce our production load so it should be somehow close to the real life traffic.

> Thanks,
> Eliezer
>
> ----
> Eliezer Croitoru
> Linux System Administrator
> Mobile: +972-5-28704261
> Email: [hidden email]
>
>
>
> -----Original Message-----
> From: squid-users [mailto:[hidden email]] On Behalf Of Ivan Larionov
> Sent: Friday, July 7, 2017 17:07
> To: Amos Jeffries <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [squid-users] Huge amount of time_wait connections after upgrade from v2 to v3
>
> Thank you for the fast reply.
>
>>> On Jul 7, 2017, at 01:10, Amos Jeffries <[hidden email]> wrote:
>>>
>>> On 07/07/17 13:55, Ivan Larionov wrote:
>>> Hi. Sorry that I'm answering to the old thread. I was on vacation and didn't have a chance to test the proposed solution.
>>> Dieter, yes, I'm on the old CentOS 6 based OS (Amazon Linux) but with a new kernel 4.9.27.
>>> Amos, thank you for the suggestions about configure flags and squid config options, I fixed all issues you pointed to.
>>> Unfortunately following workarounds didn't help:
>>> * client_idle_pconn_timeout 30 seconds
>>> * half_closed_clients on
>>> * client_persistent_connections off
>>> * server_persistent_connections off
>>
>> TIME_WAIT is a sign that Squid is following the normal TCP process for closing connections, and doing so before the remote endpoint closes.
>>
>> Disabling persistent connections increases the number of connections going through that process. So you definitely want those settings ON to reduce the WAIT states.
>>
>
> I understand that. I just wrote that I tried this options and they had no effect. They didn't increase nor decrease number of TIME_WAIT connections. I removed them when I started testing older versions.
>
>> If the remote end is the one doing the closure, then you will see less TIME_WAIT, but CLOSE_WAIT will increase instead. The trick is in finding the right balance of timeouts on both client and server idle pconn to get the minimum of total WAIT states. That is network dependent.
>>
>> Generally though forward/explicit and intercept proxies want client_idle_pconn_timeout to be shorter than server_idle_pconn_timeout. Reverse proxy want the opposite.
>>
>>
>>
>>> However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
>>> * All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
>>> * 3.5.20 is the latest stable version.
>>> * 3.5.21 is the first broken version.
>>> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>>> This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
>>> I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.
>>
>> The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.
>>
>
> I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.
>
> In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.
>
> When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.
>
> I know that time_wait is fine in general. Until you have too many of them.
>
>> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Amos Jeffries
Administrator
In reply to this post by Ivan Larionov
On 08/07/17 02:06, Ivan Larionov wrote:
> Thank you for the fast reply.
>
>> On Jul 7, 2017, at 01:10, Amos Jeffries <[hidden email]> wrote:
>>
>>> On 07/07/17 13:55, Ivan Larionov wrote:
 >>>

>>> However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
>>> * All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
>>> * 3.5.20 is the latest stable version.
>>> * 3.5.21 is the first broken version.
>>> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>>> This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
>>> I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.
>>
>> The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.
>>
>
> I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.
>

More requests per second generally means more TCP connections churning.

Also when going from Squid-2 to Squid-3 there is a change from HTTP/1.0
to HTTP/1.1 and the accompanying switch from MISS to near-HIT
revalidations. Revalidations usually only have headers without payload
so the same bytes/sec can contain orders more magnitude of those than
MISS - which is the point of having them.


> In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.
>

Compared to what RPS change? Given the above traffic change this may be
reasonable for a v2 to v3 jump. Or own very rough tests on old hardware
lab tests have shown rates for Squid-2 at ~900 RPS and Squid-3 at around
1900 RPS.


> When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.
>
> I know that time_wait is fine in general. Until you have too many of them.
>

At this point I'd check that your testing software supports HTTP/1.1
pipelines. It may be giving you worst-case results with per-message TCP
churn rather than what will occur normally (pipelines of N requests per
TCP connection).
Though seeing such a jump between Squid-3 releases is worrying.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Ivan Larionov
RPS didn't change. Throughput didn't change. Our prod load is 200-700 RPS per server (changes during the day) and my load test load was constant 470 RPS.

Clients didn't change. Doesn't matter if they use HTTP 1.1 or 1.0, because the only thing which changed is squid version. And as I figured out, it's not actually about 2.7 to 3.5 update, it's all about difference between 3.5.20 and 3.5.21.

I'm sorry but anything you say about throughput doesn't make any sense. Load pattern didn't change. Squid still handles the same amount of requests.

I think I'm going to load test every patch applied to 3.5.21 from this page: http://www.squid-cache.org/Versions/v3/3.5/changesets/SQUID_3_5_21.html so I'll be able to point to exact change which introduced this behavior. I'll try to do it during the weekend or may be on Monday.

On Sat, Jul 8, 2017 at 5:46 AM, Amos Jeffries <[hidden email]> wrote:
On 08/07/17 02:06, Ivan Larionov wrote:
Thank you for the fast reply.

On Jul 7, 2017, at 01:10, Amos Jeffries <[hidden email]> wrote:

On 07/07/17 13:55, Ivan Larionov wrote:
>>>
However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
* All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
* 3.5.20 is the latest stable version.
* 3.5.21 is the first broken version.
* 3.5.23, 3.5.25, 3.5.26 are broken as well.
This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.

The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.


I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.


More requests per second generally means more TCP connections churning.

Also when going from Squid-2 to Squid-3 there is a change from HTTP/1.0 to HTTP/1.1 and the accompanying switch from MISS to near-HIT revalidations. Revalidations usually only have headers without payload so the same bytes/sec can contain orders more magnitude of those than MISS - which is the point of having them.


In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.


Compared to what RPS change? Given the above traffic change this may be reasonable for a v2 to v3 jump. Or own very rough tests on old hardware lab tests have shown rates for Squid-2 at ~900 RPS and Squid-3 at around 1900 RPS.


When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.

I know that time_wait is fine in general. Until you have too many of them.


At this point I'd check that your testing software supports HTTP/1.1 pipelines. It may be giving you worst-case results with per-message TCP churn rather than what will occur normally (pipelines of N requests per TCP connection).
Though seeing such a jump between Squid-3 releases is worrying.

Amos



--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Ivan Larionov
Ok, mystery solved.

Patch "HTTP: do not allow Proxy-Connection to override Connection header" changes the behavior. And we indeed send from our clients:

Connection: close
Proxy-Connection: Keep-Alive


On Sat, Jul 8, 2017 at 9:51 AM, Ivan Larionov <[hidden email]> wrote:
RPS didn't change. Throughput didn't change. Our prod load is 200-700 RPS per server (changes during the day) and my load test load was constant 470 RPS.

Clients didn't change. Doesn't matter if they use HTTP 1.1 or 1.0, because the only thing which changed is squid version. And as I figured out, it's not actually about 2.7 to 3.5 update, it's all about difference between 3.5.20 and 3.5.21.

I'm sorry but anything you say about throughput doesn't make any sense. Load pattern didn't change. Squid still handles the same amount of requests.

I think I'm going to load test every patch applied to 3.5.21 from this page: http://www.squid-cache.org/Versions/v3/3.5/changesets/SQUID_3_5_21.html so I'll be able to point to exact change which introduced this behavior. I'll try to do it during the weekend or may be on Monday.

On Sat, Jul 8, 2017 at 5:46 AM, Amos Jeffries <[hidden email]> wrote:
On 08/07/17 02:06, Ivan Larionov wrote:
Thank you for the fast reply.

On Jul 7, 2017, at 01:10, Amos Jeffries <[hidden email]> wrote:

On 07/07/17 13:55, Ivan Larionov wrote:
>>>
However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
* All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
* 3.5.20 is the latest stable version.
* 3.5.21 is the first broken version.
* 3.5.23, 3.5.25, 3.5.26 are broken as well.
This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.

The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.


I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.


More requests per second generally means more TCP connections churning.

Also when going from Squid-2 to Squid-3 there is a change from HTTP/1.0 to HTTP/1.1 and the accompanying switch from MISS to near-HIT revalidations. Revalidations usually only have headers without payload so the same bytes/sec can contain orders more magnitude of those than MISS - which is the point of having them.


In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.


Compared to what RPS change? Given the above traffic change this may be reasonable for a v2 to v3 jump. Or own very rough tests on old hardware lab tests have shown rates for Squid-2 at ~900 RPS and Squid-3 at around 1900 RPS.


When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.

I know that time_wait is fine in general. Until you have too many of them.


At this point I'd check that your testing software supports HTTP/1.1 pipelines. It may be giving you worst-case results with per-message TCP churn rather than what will occur normally (pipelines of N requests per TCP connection).
Though seeing such a jump between Squid-3 releases is worrying.

Amos



--
With best regards, Ivan Larionov.



--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Huge amount of time_wait connections after upgrade from v2 to v3

Amos Jeffries
Administrator
On 15/07/17 11:28, Ivan Larionov wrote:
> Ok, mystery solved.
>
> Patch "HTTP: do not allow Proxy-Connection to override Connection
> header" changes the behavior. And we indeed send from our clients:
>
> Connection: close
> Proxy-Connection: Keep-Alive
>

Ah. Yes that would lead to trouble.

If you have any influence with the authors of that client software
please get them to remove the Proxy-Connection header. It should never
be used in HTTP/1.1 traffic. At the very least, if the clients are
HTTP/1.0 still it should mirror the Connection value to have consistent
behaviour from proxies.

If you need a more authoritative reference:
<https://tools.ietf.org/html/rfc7230#appendix-A.1.2>

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Loading...