squid hanging in 100% steal

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

squid hanging in 100% steal

Marc
Hi,

For some reason my squid sometimes hangs (after weeks of running
smoothly) in 100% steal, until I kill the proces and restart it, after
which the proces will again run stable for weeks.

It's running on a AWS EC2 instance, squid version:
squid-3.5.20-10.34.amzn1.x86_64 , see below for some debugging info.
Any idea what could be the problem here ? Thanks!

top:
[11:56:49][root@ip-172-31-9-138 ~]# top
top - 11:57:11 up 218 days, 17:36,  1 user,  load average: 1.06, 1.17, 1.09
Tasks:  81 total,   2 running,  79 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 95.2%st
Mem:    501220k total,   405748k used,    95472k free,    65512k buffers
Swap:        0k total,        0k used,        0k free,    88948k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29963 squid     20   0  290m 171m 7472 R 99.9 35.1 672:59.73 squid
    1 root      20   0 19648 2480 2148 S  0.0  0.5   0:02.05 init
<snip>

vmstat:
[11:57:39][root@ip-172-31-9-138 ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1      0  95408  65536  89052    0    0     0     4    1    1  0  0 99  0  0
 1  0      0  95408  65536  89040    0    0     0     4   56   36  5  0  0  0 95
 2  0      0  95408  65536  89040    0    0     0     0   54   18  5  0  0  0 95
 1  0      0  95408  65536  89040    0    0     0     0   57   30  5  0  0  0 95
 1  0      0  95408  65536  89040    0    0     0     4   52   25  5  0  0  0 95
 3  0      0  95408  65536  89040    0    0     0     0   52   14  6  0  0  0 94
 1  0      0  95408  65536  89040    0    0     0     0   50   26  4  0  0  0 96
 2  0      0  95408  65536  89040    0    0     0     0   53   21  6  0  0  0 94
 1  0      0  95408  65540  89036    0    0     0    12   62   38  5  0  0  0 95
 2  0      0  95408  65540  89040    0    0     0    36   55   14  5  0  0  0 95
 1  0      0  95408  65540  89040    0    0     0     0   51   34  5  0  0  0 95

gdb:
[11:55:07][root@ip-172-31-9-138 ~]# sudo gdb -n -batch -ex backtrace -pid 29963
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00000000007bca52 in
CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
const&) ()
#0  0x00000000007bca52 in
CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
const&) ()
#1  0x00000000007bc3d4 in Comm::AcceptLimiter::kick() ()
#2  0x0000000000721867 in AsyncCall::make() ()
#3  0x00000000007259e2 in AsyncCallQueue::fireNext() ()
#4  0x0000000000725e20 in AsyncCallQueue::fire() ()
#5  0x00000000005b0089 in EventLoop::runOnce() ()
#6  0x00000000005b0178 in EventLoop::run() ()
#7  0x00000000006192cc in SquidMain(int, char**) ()
#8  0x0000000000514b3b in main ()

strace:
[11:52:51][root@ip-172-31-9-138 ~]# strace -t -s 8192 -f -p 29963
Process 29963 attached
11:53:00 accept(10, {sa_family=AF_INET6, sin6_port=htons(45756),
inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, [28]) = 16
11:53:00 getsockname(16, {sa_family=AF_INET6, sin6_port=htons(3128),
inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, [28]) = 0
11:53:00 fcntl(16, F_GETFD)             = 0
11:53:00 fcntl(16, F_SETFD, FD_CLOEXEC) = 0
11:53:00 fcntl(16, F_GETFL)             = 0x2 (flags O_RDWR)
11:53:00 fcntl(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
11:53:00 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 23
11:53:00 ioctl(23, SIOCGARP, 0x7ffd21abeaa0) = -1 ENODEV (No such device)
11:53:00 ioctl(23, SIOCGIFCONF, {120, {{"lo", {AF_INET,
inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("<snip>")}},
{"eth1", {AF_INET, inet_addr("<snip>")}}}}) = 0
11:53:00 ioctl(23, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such
device or address)
11:53:00 ioctl(23, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such
device or address)
11:53:00 close(23)                      = 0
11:53:00 epoll_ctl(5, EPOLL_CTL_DEL, 27, {0, {u32=27, u64=4294967323}}) = 0
11:53:00 close(27)                      = 0
11:53:03 accept(10, {sa_family=AF_INET6, sin6_port=htons(50050),
inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, [28]) = 23
11:53:03 getsockname(23, {sa_family=AF_INET6, sin6_port=htons(3128),
inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, [28]) = 0
11:53:03 fcntl(23, F_GETFD)             = 0
11:53:03 fcntl(23, F_SETFD, FD_CLOEXEC) = 0
11:53:03 fcntl(23, F_GETFL)             = 0x2 (flags O_RDWR)
11:53:03 fcntl(23, F_SETFL, O_RDWR|O_NONBLOCK) = 0
11:53:03 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 25
11:53:03 ioctl(25, SIOCGARP, 0x7ffd21abeaa0) = -1 ENODEV (No such device)
11:53:03 ioctl(25, SIOCGIFCONF, {120, {{"lo", {AF_INET,
inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("<snip>")}},
{"eth1", {AF_INET, inet_addr("<snip>")}}}}) = 0
11:53:03 ioctl(25, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such
device or address)
11:53:03 ioctl(25, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such
device or address)
11:53:03 close(25)                      = 0
11:53:03 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
11:53:03 write(9, "<snip> <snip> <snip> - - [24/Jan/2019:11:52:44
+0000] \"CONNECT <snip>  HTTP/1.1\" 200 0 \"-\" \"Mozilla/5.0 (Windows
NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0\"
TCP_TUNNEL:HIER_DIRECT\n", 223) = 223
11:53:03 epoll_ctl(5, EPOLL_CTL_DEL, 15, {0, {u32=15, u64=4294967311}}) = 0
11:53:03 close(15)                      = 0
^CProcess 29963 detached
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: squid hanging in 100% steal

Amos Jeffries
Administrator
On 25/01/19 1:24 am, Marc wrote:
> Hi,
>
> For some reason my squid sometimes hangs (after weeks of running
> smoothly) in 100% steal, until I kill the proces and restart it, after
> which the proces will again run stable for weeks.

What does "100% steal" mean?

>
> It's running on a AWS EC2 instance, squid version:
> squid-3.5.20-10.34.amzn1.x86_64 , see below for some debugging info.
> Any idea what could be the problem here ? Thanks!
>
> top:
> [11:56:49][root@ip-172-31-9-138 ~]# top
> top - 11:57:11 up 218 days, 17:36,  1 user,  load average: 1.06, 1.17, 1.09
> Tasks:  81 total,   2 running,  79 sleeping,   0 stopped,   0 zombie
> Cpu(s):  4.5%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 95.2%st
> Mem:    501220k total,   405748k used,    95472k free,    65512k buffers
> Swap:        0k total,        0k used,        0k free,    88948k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 29963 squid     20   0  290m 171m 7472 R 99.9 35.1 672:59.73 squid
>     1 root      20   0 19648 2480 2148 S  0.0  0.5   0:02.05 init
> <snip>
>
> vmstat:
> [11:57:39][root@ip-172-31-9-138 ~]# vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  1      0  95408  65536  89052    0    0     0     4    1    1  0  0 99  0  0
>  1  0      0  95408  65536  89040    0    0     0     4   56   36  5  0  0  0 95
>  2  0      0  95408  65536  89040    0    0     0     0   54   18  5  0  0  0 95
>  1  0      0  95408  65536  89040    0    0     0     0   57   30  5  0  0  0 95
>  1  0      0  95408  65536  89040    0    0     0     4   52   25  5  0  0  0 95
>  3  0      0  95408  65536  89040    0    0     0     0   52   14  6  0  0  0 94
>  1  0      0  95408  65536  89040    0    0     0     0   50   26  4  0  0  0 96
>  2  0      0  95408  65536  89040    0    0     0     0   53   21  6  0  0  0 94
>  1  0      0  95408  65540  89036    0    0     0    12   62   38  5  0  0  0 95
>  2  0      0  95408  65540  89040    0    0     0    36   55   14  5  0  0  0 95
>  1  0      0  95408  65540  89040    0    0     0     0   51   34  5  0  0  0 95
>
> gdb:
> [11:55:07][root@ip-172-31-9-138 ~]# sudo gdb -n -batch -ex backtrace -pid 29963
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 0x00000000007bca52 in
> CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
> const&) ()
> #0  0x00000000007bca52 in
> CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
> const&) ()
> #1  0x00000000007bc3d4 in Comm::AcceptLimiter::kick() ()
> #2  0x0000000000721867 in AsyncCall::make() ()
> #3  0x00000000007259e2 in AsyncCallQueue::fireNext() ()
> #4  0x0000000000725e20 in AsyncCallQueue::fire() ()
> #5  0x00000000005b0089 in EventLoop::runOnce() ()
> #6  0x00000000005b0178 in EventLoop::run() ()
> #7  0x00000000006192cc in SquidMain(int, char**) ()
> #8  0x0000000000514b3b in main ()
>

This looks like it may be one of the symptoms of
<https://bugs.squid-cache.org/show_bug.cgi?id=4885> which was fixed in
Squid-4.3 release.

Please try the current Squid-4 release to see if the issue is already
resolved. v3.5 is no longer supported, so if it is a bug we will need
traces and replication using the current Squid (v4 or v5) version to
have a realistic chance of anyone being able to fix it.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: squid hanging in 100% steal

Eliezer Croitoru
In reply to this post by Marc
You can try the latest squid with my repo at:
http://ngtech.co.il/repo/amzn/1/

http://ngtech.co.il/repo/amzn/1/x86_64/squid-4.5-1.amzn1.x86_64.rpm
http://ngtech.co.il/repo/amzn/1/x86_64/squid-helpers-4.5-1.amzn1.x86_64.rpm

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]



-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Marc
Sent: Thursday, January 24, 2019 14:24
To: [hidden email]
Subject: [squid-users] squid hanging in 100% steal

Hi,

For some reason my squid sometimes hangs (after weeks of running
smoothly) in 100% steal, until I kill the proces and restart it, after which the proces will again run stable for weeks.

It's running on a AWS EC2 instance, squid version:
squid-3.5.20-10.34.amzn1.x86_64 , see below for some debugging info.
Any idea what could be the problem here ? Thanks!

top:
[11:56:49][root@ip-172-31-9-138 ~]# top
top - 11:57:11 up 218 days, 17:36,  1 user,  load average: 1.06, 1.17, 1.09
Tasks:  81 total,   2 running,  79 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 95.2%st
Mem:    501220k total,   405748k used,    95472k free,    65512k buffers
Swap:        0k total,        0k used,        0k free,    88948k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29963 squid     20   0  290m 171m 7472 R 99.9 35.1 672:59.73 squid
    1 root      20   0 19648 2480 2148 S  0.0  0.5   0:02.05 init
<snip>

vmstat:
[11:57:39][root@ip-172-31-9-138 ~]# vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1      0  95408  65536  89052    0    0     0     4    1    1  0  0 99  0  0
 1  0      0  95408  65536  89040    0    0     0     4   56   36  5  0  0  0 95
 2  0      0  95408  65536  89040    0    0     0     0   54   18  5  0  0  0 95
 1  0      0  95408  65536  89040    0    0     0     0   57   30  5  0  0  0 95
 1  0      0  95408  65536  89040    0    0     0     4   52   25  5  0  0  0 95
 3  0      0  95408  65536  89040    0    0     0     0   52   14  6  0  0  0 94
 1  0      0  95408  65536  89040    0    0     0     0   50   26  4  0  0  0 96
 2  0      0  95408  65536  89040    0    0     0     0   53   21  6  0  0  0 94
 1  0      0  95408  65540  89036    0    0     0    12   62   38  5  0  0  0 95
 2  0      0  95408  65540  89040    0    0     0    36   55   14  5  0  0  0 95
 1  0      0  95408  65540  89040    0    0     0     0   51   34  5  0  0  0 95

gdb:
[11:55:07][root@ip-172-31-9-138 ~]# sudo gdb -n -batch -ex backtrace -pid 29963 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
0x00000000007bca52 in
CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
const&) ()
#0  0x00000000007bca52 in
CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
const&) ()
#1  0x00000000007bc3d4 in Comm::AcceptLimiter::kick() ()
#2  0x0000000000721867 in AsyncCall::make() ()
#3  0x00000000007259e2 in AsyncCallQueue::fireNext() ()
#4  0x0000000000725e20 in AsyncCallQueue::fire() ()
#5  0x00000000005b0089 in EventLoop::runOnce() ()
#6  0x00000000005b0178 in EventLoop::run() ()
#7  0x00000000006192cc in SquidMain(int, char**) ()
#8  0x0000000000514b3b in main ()

strace:
[11:52:51][root@ip-172-31-9-138 ~]# strace -t -s 8192 -f -p 29963 Process 29963 attached
11:53:00 accept(10, {sa_family=AF_INET6, sin6_port=htons(45756), inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 16
11:53:00 getsockname(16, {sa_family=AF_INET6, sin6_port=htons(3128), inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
11:53:00 fcntl(16, F_GETFD)             = 0
11:53:00 fcntl(16, F_SETFD, FD_CLOEXEC) = 0
11:53:00 fcntl(16, F_GETFL)             = 0x2 (flags O_RDWR)
11:53:00 fcntl(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
11:53:00 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 23
11:53:00 ioctl(23, SIOCGARP, 0x7ffd21abeaa0) = -1 ENODEV (No such device)
11:53:00 ioctl(23, SIOCGIFCONF, {120, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("<snip>")}}, {"eth1", {AF_INET, inet_addr("<snip>")}}}}) = 0
11:53:00 ioctl(23, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such device or address)
11:53:00 ioctl(23, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such device or address)
11:53:00 close(23)                      = 0
11:53:00 epoll_ctl(5, EPOLL_CTL_DEL, 27, {0, {u32=27, u64=4294967323}}) = 0
11:53:00 close(27)                      = 0
11:53:03 accept(10, {sa_family=AF_INET6, sin6_port=htons(50050), inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 23
11:53:03 getsockname(23, {sa_family=AF_INET6, sin6_port=htons(3128), inet_pton(AF_INET6, "::ffff:<snip>", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
11:53:03 fcntl(23, F_GETFD)             = 0
11:53:03 fcntl(23, F_SETFD, FD_CLOEXEC) = 0
11:53:03 fcntl(23, F_GETFL)             = 0x2 (flags O_RDWR)
11:53:03 fcntl(23, F_SETFL, O_RDWR|O_NONBLOCK) = 0
11:53:03 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 25
11:53:03 ioctl(25, SIOCGARP, 0x7ffd21abeaa0) = -1 ENODEV (No such device)
11:53:03 ioctl(25, SIOCGIFCONF, {120, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("<snip>")}}, {"eth1", {AF_INET, inet_addr("<snip>")}}}}) = 0
11:53:03 ioctl(25, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such device or address)
11:53:03 ioctl(25, SIOCGARP, 0x7ffd21abeaa0) = -1 ENXIO (No such device or address)
11:53:03 close(25)                      = 0
11:53:03 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
11:53:03 write(9, "<snip> <snip> <snip> - - [24/Jan/2019:11:52:44
+0000] \"CONNECT <snip>  HTTP/1.1\" 200 0 \"-\" \"Mozilla/5.0 (Windows
NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0\"
TCP_TUNNEL:HIER_DIRECT\n", 223) = 223
11:53:03 epoll_ctl(5, EPOLL_CTL_DEL, 15, {0, {u32=15, u64=4294967311}}) = 0
11:53:03 close(15)                      = 0
^CProcess 29963 detached
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: [ext] Re: squid hanging in 100% steal

Ralf Hildebrandt
In reply to this post by Amos Jeffries
* Amos Jeffries <[hidden email]>:
> On 25/01/19 1:24 am, Marc wrote:
> > Hi,
> >
> > For some reason my squid sometimes hangs (after weeks of running
> > smoothly) in 100% steal, until I kill the proces and restart it, after
> > which the proces will again run stable for weeks.
>
> What does "100% steal" mean?

http://blog.scoutapp.com/articles/2013/07/25/understanding-cpu-steal-time-when-should-you-be-worried

--
Ralf Hildebrandt                   Charite Universitätsmedizin Berlin
[hidden email]        Campus Benjamin Franklin
https://www.charite.de             Hindenburgdamm 30, 12203 Berlin
Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users