Big HTTP-POST file uploads not working

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Big HTTP-POST file uploads not working

Matthias Weigel

Hi,

i am using Squid as a reverse proxy.
Squid tested is 3.5.20 and 3.5.28 (same result). OS of Squid box is
Centos7 latest.

The backend is some IIS-server with a software "Esko Webcenter" (used in
printing industry).

Everything is working, except large HTTP-POST uploads stall and time
out. Small HTTP-POST uploads work fine.
The amount of "large" or "small" varies and seems to be somehow
dependent on connection speed. For some tests with fast connections, the
limit seems to be around 800 MB, for a slower connection speed more like
50MB. The exact amount of data transfered is not consistent, each test
shows a different value. The software wants to upload up to 3GB.

The upload client is some Javascript/Ajax bloat, that shows a progress
bar. That progess bar stalls at some point and then the Javascript
client shows some generic error message.

Everything works fine without Squid reverse proxy in between.
Everything works fine with Apache as reverse proxy!

In all tests, a simple firewall is in the path, that shows no error. No
IPS, DPI or similar is present. No Selinux on Squid box.

When the upload stalls, i am seeing the receive queue grow on the Squid
box OS:
netstat -anp
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
[...]
tcp6  1443336      0 217.xx.xx.18:2443        77.7.88.123:43252
ESTABLISHED 10512/(squid-1)
[...]

At this point wireshark shows the TCP window at the client shrink to
zero and after some time a tcp teardown.

So squid seems to not read any more from the client tcp stream! Why?


In cache.log i get no error.
In access.log i get the following (each line one test):
1548315375.580 147346 77.7.88.123 TCP_MISS/502 5200 POST
https://webcenter.example.com/WebCenter/asyncupload.wcr?action=DO_UPLOAD
- FIRSTUP_PARENT/10.yy.yy.21 text/html
1548315515.773 140151 77.7.88.123 TCP_MISS/502 5200 POST
https://webcenter.example.com/WebCenter/asyncupload.wcr?action=DO_UPLOAD
- FIRSTUP_PARENT/10.yy.yy.21 text/html
1548316276.926 154505 77.7.88.123 TCP_MISS/502 5200 POST
https://webcenter.example.com/WebCenter/asyncupload.wcr?action=DO_UPLOAD
- FIRSTUP_PARENT/10.yy.yy.21 text/html
1548318410.172 137555 77.7.88.123 TCP_MISS/502 5200 POST
https://webcenter.example.com/WebCenter/asyncupload.wcr?action=DO_UPLOAD
- FIRSTUP_PARENT/10.yy.yy.21 text/html
1548318550.384 140082 77.7.88.123 TCP_MISS/502 5200 POST
https://webcenter.example.com/WebCenter/asyncupload.wcr?action=DO_UPLOAD
- FIRSTUP_PARENT/10.yy.yy.21 text/html
1548352767.439 141549 77.7.88.123 TCP_MISS/502 5122 POST
https://webcenter.example.com/WebCenter/asyncupload.wcr?action=DO_UPLOAD
- FIRSTUP_PARENT/10.yy.yy.21 text/html


Using some of Squids debug function, the HTTP-POST request of a testfile
upload with size 268435456 bytes seems to look like this:
POST /WebCenter/asyncupload.wcr?action=DO_UPLOAD HTTP/1.1^M
Host: webcenter.example.com^M
Connection: keep-alive^M
Content-Length: 268436133^M
Accept: application/json, text/javascript, */*; q=0.01^M
Origin: https://webcenter.example.com^M
X-Requested-With: XMLHttpRequest^M
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36^M
DNT: 1^M
Content-Type: multipart/form-data;
boundary=----WebKitFormBoundaryoYIJ0e8lVMdsNIh4^M
Referer:
https://webcenter.example.com/WebCenter/projdetailsdocsadddocapplet.jsp?projectID=00002_0000028952&folderID=00002_0000363216&menu_file=addnewdocument^M
Accept-Encoding: gzip, deflate, br^M
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7^M
Cookie:
JSESSIONID=3B3B60B9E2EF7F74700AF4877C17F6A9C43E76DEF8F219839EEC3779B310124C15EEE4CF57CB8437E6FD5FFE8D76EF5BE79D2304D1023C4F6A539D0F619A245D;
/WebCenter/projdetails.jsp=; /WebCenter/projdetailsdocs.jsp=;
ASPSESSIONIDQERDBCDC=OPHNNBAADGNBKMBPGGKAFIOI^M


Squid forwards it to the backend server like this:
POST /WebCenter/asyncupload.wcr?action=DO_UPLOAD HTTP/1.1^M
Content-Length: 268436133^M
Accept: application/json, text/javascript, */*; q=0.01^M
Origin: https://webcenter.example.com^M
X-Requested-With: XMLHttpRequest^M
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36^M
DNT: 1^M
Content-Type: multipart/form-data;
boundary=----WebKitFormBoundaryoYIJ0e8lVMdsNIh4^M
Referer:
https://webcenter.example.com/WebCenter/projdetailsdocsadddocapplet.jsp?projectID=00002_0000028952&folderID=00002_0000363216&menu_file=addnewdocument^M
Accept-Encoding: gzip, deflate, br^M
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7^M
Cookie:
JSESSIONID=3B3B60B9E2EF7F74700AF4877C17F6A9C43E76DEF8F219839EEC3779B310124C15EEE4CF57CB8437E6FD5FFE8D76EF5BE79D2304D1023C4F6A539D0F619A245D;
/WebCenter/projdetails.jsp=; /WebCenter/projdetailsdocs.jsp=;
ASPSESSIONIDQERDBCDC=OPHNNBAADGNBKMBPGGKAFIOI^M
Host: webcenter.example.com^M
Via: 1.1 HFKGN062.example.com (squid/3.5.28)^M
Surrogate-Capability: HFKGN062.example.com="Surrogate/1.0 ESI/1.0"^M
X-Forwarded-For: 77.7.88.123^M
Cache-Control: max-age=259200^M
Connection: keep-alive^M
Front-End-Https: On^M
^M

In the debug logs i find the following error message:
X-Squid-Error : ERR_READ_ERROR 104

I tried to change various kernel parameters like tcp_rmem, rmem_max,
etc. Results in no change to the problem.

Any ideas what is causing this?
How can i further track down the problem?
Is anybody on this list able to do HTTP-POST with gigabyte sized files
through squid?

TIA

Matthias


P.S. here is my squid.conf:
acl SSL_ports port 443
acl CONNECT method CONNECT
acl trace method TRACE
acl webcenter dstdomain webcenter.example.com
acl my443 myport 443
acl my2443 myport 2443
http_access deny trace
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow webcenter
http_access allow localhost
http_access deny all
icp_access allow all
http_port 80 accel
https_port  2443 accel vport=443 cert=/etc/squid/certs/webcenter.crt.pem
key=/etc/squid/certs/webcenter.key.pem options=NO_SSLv2,NO_SSLv3
ssl_unclean_shutdown on
cache_peer 10.yy.yy.21 parent 443 0 originserver no-digest
front-end-https=on no-query ssl sslflags=DONT_VERIFY_PEER
ssloptions=NO_SSLv2,NO_SSLv3 login=PASS
cache_peer_access 10.yy.yy.21 allow webcenter
cache_peer_access 10.yy.yy.21 deny all
cache_mem 250 MB
maximum_object_size_in_memory 1 MB
cache_dir ufs /var/spool/squid 10000 48 256
maximum_object_size 40960 KB
access_log /var/log/squid/access.log squid
cache_store_log none
debug_options ALL,1
strip_query_terms off
redirect_program /etc/squid/urlmapper.pl
redirect_children 10
redirect_rewrites_host_header on
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320
request_header_max_size 200 KB
reply_header_max_size 200 KB
request_body_max_size 3 GB
peer_connect_timeout 10 seconds
request_timeout 5 minutes
persistent_request_timeout 5 minutes
half_closed_clients on
pconn_timeout 10 minute
shutdown_lifetime 60 seconds
cache_mgr [hidden email]
detect_broken_pconn on
deny_info TCP_RESET trace
always_direct deny all
never_direct allow all
hosts_file /etc/hosts
coredump_dir /var/spool/squid
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Big HTTP-POST file uploads not working

Alex Rousskov
On 1/27/19 3:42 AM, Matthias Weigel wrote:

> large HTTP-POST uploads stall and time out.

> squid seems to not read any more from the client tcp stream! Why?


A Squid developer probably can find the answer to that question by
examining cache.log with debug_options set to ALL,9.


> In the debug logs i find the following error message:
> X-Squid-Error : ERR_READ_ERROR 104

Does ERR_READ_ERROR first appear in cache.log before or after the client
times out?


> How can i further track down the problem?

Post (a link to) compressed cache.log with debug_options set to ALL,9
and reproducing the problem with a single transaction without any other
traffic going through the Squid box.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Big HTTP-POST file uploads not working

Matthias Weigel
Hi Alex,

this problem just got a lot weirder:
- Upload works fine with debug_options ALL,7 ALL,8 or ALL,9.
- Upload fails with debug_options ALL,6 and lower.

I created a debug_options ALL,6 cache.log.
http://94.16.117.186/squid/cache.log.3b.gz

During this test, the following happened at the client:

Stall at 20:10:38
Recover at 20:12:47
Stall at 20:14:11
Failure at 20:16:17

Any ideas?

Best Regards

Matthias

Am 27.01.2019 um 18:19 schrieb Alex Rousskov:

> On 1/27/19 3:42 AM, Matthias Weigel wrote:
>
>> large HTTP-POST uploads stall and time out.
>
>> squid seems to not read any more from the client tcp stream! Why?
>
>
> A Squid developer probably can find the answer to that question by
> examining cache.log with debug_options set to ALL,9.
>
>
>> In the debug logs i find the following error message:
>> X-Squid-Error : ERR_READ_ERROR 104
>
> Does ERR_READ_ERROR first appear in cache.log before or after the client
> times out?
>
>
>> How can i further track down the problem?
>
> Post (a link to) compressed cache.log with debug_options set to ALL,9
> and reproducing the problem with a single transaction without any other
> traffic going through the Squid box.
>
> Alex.
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Big HTTP-POST file uploads not working

Amos Jeffries
Administrator
On 29/01/19 8:50 am, Matthias Weigel wrote:

> Hi Alex,
>
> this problem just got a lot weirder:
> - Upload works fine with debug_options ALL,7 ALL,8 or ALL,9.
> - Upload fails with debug_options ALL,6 and lower.
>
> I created a debug_options ALL,6 cache.log.
> http://94.16.117.186/squid/cache.log.3b.gz
>
> During this test, the following happened at the client:
>
> Stall at 20:10:38
> Recover at 20:12:47
> Stall at 20:14:11
> Failure at 20:16:17
>
> Any ideas?
>

The FD 18 client->Squid I/O stops happening at 20:10:35.923 with input
buffer full and more data apparently waiting to arrive.

But the FD 20 Squid->server I/O halted earlier at 20:10:35.437 with a
4KB packet being sent and still waiting for the signal to proceed
sending more.

At 20:12:47.068 a signal is received finally, but it indicates the
server connection has gone away with no reason given. Squid generates a
502 error response and delivers that to the client.

The client opens a new connection, does all its TLS handshakes again and
re-sends the same very large POST request. Which encounters the same
sequence of server stops responding, client buffer fills and "stall"
until the server FD is closed. Then again the 502 from Squid but the
client does not resume this time.


So whatever is going on it looks to me like it is the origin server
being a problem. Though you may need to look at the actual TCP packet
flow to confirm. I expect you will find that a packet from Squid to the
server does not get ACK'd, or a network timeout occurs (NAT or TCP
lifetimes are common), ...

 Or maybe the server is itself dealing with a similar backlog from
unresponsive internal agent/component leaving its input buffer to
overflow just like Squids one from the client does.


Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Big HTTP-POST file uploads not working

Matthias Weigel
Hi Amos,

thanks for your analysing!

Any idea, why the IIS server behaves differently with Apache as the
Reverse Proxy (anything else being the same)?

Does Squid do anything special with TCP streams?

Best Regards

Matthias

Am 29.01.2019 um 05:01 schrieb Amos Jeffries:

> On 29/01/19 8:50 am, Matthias Weigel wrote:
>> Hi Alex,
>>
>> this problem just got a lot weirder:
>> - Upload works fine with debug_options ALL,7 ALL,8 or ALL,9.
>> - Upload fails with debug_options ALL,6 and lower.
>>
>> I created a debug_options ALL,6 cache.log.
>> http://94.16.117.186/squid/cache.log.3b.gz
>>
>> During this test, the following happened at the client:
>>
>> Stall at 20:10:38
>> Recover at 20:12:47
>> Stall at 20:14:11
>> Failure at 20:16:17
>>
>> Any ideas?
>>
>
> The FD 18 client->Squid I/O stops happening at 20:10:35.923 with input
> buffer full and more data apparently waiting to arrive.
>
> But the FD 20 Squid->server I/O halted earlier at 20:10:35.437 with a
> 4KB packet being sent and still waiting for the signal to proceed
> sending more.
>
> At 20:12:47.068 a signal is received finally, but it indicates the
> server connection has gone away with no reason given. Squid generates a
> 502 error response and delivers that to the client.
>
> The client opens a new connection, does all its TLS handshakes again and
> re-sends the same very large POST request. Which encounters the same
> sequence of server stops responding, client buffer fills and "stall"
> until the server FD is closed. Then again the 502 from Squid but the
> client does not resume this time.
>
>
> So whatever is going on it looks to me like it is the origin server
> being a problem. Though you may need to look at the actual TCP packet
> flow to confirm. I expect you will find that a packet from Squid to the
> server does not get ACK'd, or a network timeout occurs (NAT or TCP
> lifetimes are common), ...
>
>  Or maybe the server is itself dealing with a similar backlog from
> unresponsive internal agent/component leaving its input buffer to
> overflow just like Squids one from the client does.
>
>
> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Big HTTP-POST file uploads not working

Amos Jeffries
Administrator
On 29/01/19 8:05 pm, Matthias Weigel wrote:
> Hi Amos,
>
> thanks for your analysing!
>
> Any idea, why the IIS server behaves differently with Apache as the
> Reverse Proxy (anything else being the same)?
>

No ideas, sorry.

> Does Squid do anything special with TCP streams?
>

Nothing that would really stand out AFAIK. The data is re-packaged into
4096 byte chunks to the server, 4170 bytes after encoding on those
particular streams. Maybe that, but unlikely.

The trace shows a fairly regular flow except at the stall points where
the send-more-data I/O signals just stop arriving.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Big HTTP-POST file uploads not working

Alex Rousskov
In reply to this post by Matthias Weigel
On 1/29/19 12:05 AM, Matthias Weigel wrote:

> Any idea, why the IIS server behaves differently with Apache as the
> Reverse Proxy (anything else being the same)?

Given Amos' analysis, if you want to get to the bottom of this problem,
you would need to figure out why the origin server drops the connection.
This may take a while, but is usually possible by enabling origin server
debugging, analyzing packet dumps, comparing working cases with
non-working ones, and/or, in the extreme cases, playing with how Squid
sends data. It may take a pro a few hours or a few days, but it is
usually doable.

One interesting experiment would be to test Squid-Apache-IIS chain.

The analysis can either confirm a Squid bug (something we can fix),
detect a hardware problem (something you can fix), or provide enough
information to develop a workaround for the IIS bug (other than enabling
ALL,9 debugging in Squid :-).

Since ALL,9 debugging has an effect, it is possible that rate-limiting
Squid-IIS and/or client-Squid connection would help work around the problem.

Alex.


> Am 29.01.2019 um 05:01 schrieb Amos Jeffries:
>> On 29/01/19 8:50 am, Matthias Weigel wrote:
>>> Hi Alex,
>>>
>>> this problem just got a lot weirder:
>>> - Upload works fine with debug_options ALL,7 ALL,8 or ALL,9.
>>> - Upload fails with debug_options ALL,6 and lower.
>>>
>>> I created a debug_options ALL,6 cache.log.
>>> http://94.16.117.186/squid/cache.log.3b.gz
>>>
>>> During this test, the following happened at the client:
>>>
>>> Stall at 20:10:38
>>> Recover at 20:12:47
>>> Stall at 20:14:11
>>> Failure at 20:16:17
>>>
>>> Any ideas?
>>>
>>
>> The FD 18 client->Squid I/O stops happening at 20:10:35.923 with input
>> buffer full and more data apparently waiting to arrive.
>>
>> But the FD 20 Squid->server I/O halted earlier at 20:10:35.437 with a
>> 4KB packet being sent and still waiting for the signal to proceed
>> sending more.
>>
>> At 20:12:47.068 a signal is received finally, but it indicates the
>> server connection has gone away with no reason given. Squid generates a
>> 502 error response and delivers that to the client.
>>
>> The client opens a new connection, does all its TLS handshakes again and
>> re-sends the same very large POST request. Which encounters the same
>> sequence of server stops responding, client buffer fills and "stall"
>> until the server FD is closed. Then again the 502 from Squid but the
>> client does not resume this time.
>>
>>
>> So whatever is going on it looks to me like it is the origin server
>> being a problem. Though you may need to look at the actual TCP packet
>> flow to confirm. I expect you will find that a packet from Squid to the
>> server does not get ACK'd, or a network timeout occurs (NAT or TCP
>> lifetimes are common), ...
>>
>>  Or maybe the server is itself dealing with a similar backlog from
>> unresponsive internal agent/component leaving its input buffer to
>> overflow just like Squids one from the client does.
>>
>>
>> Amos
>> _______________________________________________
>> squid-users mailing list
>> [hidden email]
>> http://lists.squid-cache.org/listinfo/squid-users
>>
>
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users