Http write cache

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Http write cache

Olivier MARCHETTA

Hello,

 

I recently set up a squid reverse proxy cache for Sharepoint Online with the help of Amos.

 

It is now accelerating all reads by caching objects in Squid.

 

Now I’m facing a more tricky problem : writing objects to the parent server.

 

In this case it’s a direct connection and so it is slow.

 

I am not familiar with all the options and capabilities of the http protocol, but do you know if it is possible to have an asynchronous write back to the parent server to accelerate the writes ?

 

Thank you.

 

Olivier Marchetta

 


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Http write cache

Amos Jeffries
Administrator
On 10/09/17 07:20, Olivier MARCHETTA wrote:

> Hello,
>
> I recently set up a squid reverse proxy cache for Sharepoint Online with
> the help of Amos.
>
> It is now accelerating all reads by caching objects in Squid.
>
> Now I’m facing a more tricky problem : writing objects to the parent server.
>
> In this case it’s a direct connection and so it is slow.
>
> I am not familiar with all the options and capabilities of the http
> protocol, but do you know if it is possible to have an asynchronous
> write back to the parent server to accelerate the writes ?

No, HTTP is message oriented. The equivalent of a write is a request
message with a payload (usually PUT or POST).

Origin servers can sometimes respond to requests with payload
("uploads") before the request has fully arrived, but any subsequent
network issues are guaranteed to result in data loss - so the practice
is discouraged. It is definitely not safe for a proxy to do so
independent of the origin server.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Http write cache

Olivier MARCHETTA
Hello,

>Origin servers can sometimes respond to requests with payload ("uploads") before the request has fully arrived, but any subsequent network issues are guaranteed to result in data loss - so the practice is discouraged.

If I understand, when it's a download (GET), Squid will replace the payload with the object in cache, if fresh.
But the HTTP control messages are still coming from the Origin server.
In case of an upload (PUT), it won't accelerate to use the Squid cache,
because the client has to wait for the Origin server's response of the payload transfer (or request).

The only option to make uploads faster is if the Origin server is aware that the client is using a reverse proxy cache and respond to the upload request before the full payload transfer.

Tell me if I'm wrong, but I think that I understand now.
Meaning that if I want to "bufferize" the writes it has to happen with another protocol before the WebDAV connection to Sharepoint Online.

Thank you.
Olivier MARCHETTA




-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Amos Jeffries
Sent: Sunday, September 10, 2017 4:32 AM
To: [hidden email]
Subject: Re: [squid-users] Http write cache

On 10/09/17 07:20, Olivier MARCHETTA wrote:

> Hello,
>
> I recently set up a squid reverse proxy cache for Sharepoint Online
> with the help of Amos.
>
> It is now accelerating all reads by caching objects in Squid.
>
> Now I’m facing a more tricky problem : writing objects to the parent server.
>
> In this case it’s a direct connection and so it is slow.
>
> I am not familiar with all the options and capabilities of the http
> protocol, but do you know if it is possible to have an asynchronous
> write back to the parent server to accelerate the writes ?

No, HTTP is message oriented. The equivalent of a write is a request message with a payload (usually PUT or POST).

Origin servers can sometimes respond to requests with payload
("uploads") before the request has fully arrived, but any subsequent network issues are guaranteed to result in data loss - so the practice is discouraged. It is definitely not safe for a proxy to do so independent of the origin server.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Http write cache

Amos Jeffries
Administrator
On 10/09/17 21:14, Olivier MARCHETTA wrote:
> Hello,
>
>> Origin servers can sometimes respond to requests with payload ("uploads") before the request has fully arrived, but any subsequent network issues are guaranteed to result in data loss - so the practice is discouraged.
>
> If I understand, when it's a download (GET), Squid will replace the payload with the object in cache, if fresh.

Nod. This is possible because two identical requests

> But the HTTP control messages are still coming from the Origin server.

Not necessarily. There are no "control messages" as such in HTTP. The
cache controls are delivered along with the cached payload to indicate
what can be done with it. Synchronous server contact (aka revalidation)
to deliver responses is only required if those controls say so.


> In case of an upload (PUT), it won't accelerate to use the Squid cache,
> because the client has to wait for the Origin server's response of the payload transfer (or request).

Yes. Squid has never seen the request before, so has no idea what
response will appear as a result.

>
> The only option to make uploads faster is if the Origin server is aware that the client is using a reverse proxy cache and respond to the upload request before the full payload transfer.
>

Close, bit not quite. The server does not need to know about the proxy,
it just has to know the upload payload is "pointless waste of bandwidth"
  (where data loss don't matter) and deliver its response early.

For example; this is usually seen with NTLM authentication, where
uploads without credentials are denied early. Because the upload has to
be repeated in full with the right credentials and all the bytes from
the first attempt can be dropped in-transit by the proxy.


> Tell me if I'm wrong, but I think that I understand now.
> Meaning that if I want to "bufferize" the writes it has to happen with another protocol before the WebDAV connection to Sharepoint Online.
>

The "other protocol" is WebDAV as far as I know. HTTP is just about
delivery of some request and its corresponding response. How WebDAV
transfers use HTTP messaging, and which parts of HTTP and WebDAV the
client and server implement may or may not support the behaviour you want.


You are then colliding with the definition differences between "cache"
and "buffer". Caches store *past* data for the purpose of reducing
current/future server work, buffers store *current* data awaiting delivery.
  An upload is normally not something seen previously, so not cacheable.

Proxies and the network itself *do* buffer data along the way. But that
in no way adds any asynchronous properties to HTTP. The client still has
to wait for the HTTP response to be delivered back to it before it can
consider the HTTP part of that transaction over - the "transaction" in
this context may or may not be the full WebDAV upload+processing on the
server.

HTTP has some mechanisms that can help improve upload behaviour and
avoid pointless bandwidth delivery. Notably the Expect:100-continue and
Range features and 201/202 status codes. WebDAV extensions to HTTP add
various other things I'm not very familiar with.
  Between them they can signal to the client a server is a) contactable
before data gets delivered, b) deliver it in small chunks to minimize
loss, and c) that any given part has completed arrival and awaiting some
state (ie full object arrival) and/or some async processing.


BUT, as should be obvious these are all application-logic level things
(ie WebDAV) and require explicit support by both the endpoint
applications on server and client for that logic to take place. The
async properties arise from how things are done *between* HTTP
transactions. The interactions are separate synchronous request+response
message pairs as far as Squid and any HTTP infrastructure is concerned.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Http write cache

Olivier MARCHETTA
Thank you Amos for this enlightenment.
I really do appreciate your help.
I will stay with the reverse proxy configuration for our POC.
We need more to cache the libraries data reads than the writes at the moment.
And the next version of OneDrive client should help with the asynchronous writes.
Still, it will download from the Cloud so Squid is necessary in all cases.

Thank you.
Regards,
Olivier MARCHETTA


-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Amos Jeffries
Sent: Sunday, September 10, 2017 6:25 PM
To: [hidden email]
Subject: Re: [squid-users] Http write cache

On 10/09/17 21:14, Olivier MARCHETTA wrote:
> Hello,
>
>> Origin servers can sometimes respond to requests with payload ("uploads") before the request has fully arrived, but any subsequent network issues are guaranteed to result in data loss - so the practice is discouraged.
>
> If I understand, when it's a download (GET), Squid will replace the payload with the object in cache, if fresh.

Nod. This is possible because two identical requests

> But the HTTP control messages are still coming from the Origin server.

Not necessarily. There are no "control messages" as such in HTTP. The cache controls are delivered along with the cached payload to indicate what can be done with it. Synchronous server contact (aka revalidation) to deliver responses is only required if those controls say so.


> In case of an upload (PUT), it won't accelerate to use the Squid
> cache, because the client has to wait for the Origin server's response of the payload transfer (or request).

Yes. Squid has never seen the request before, so has no idea what response will appear as a result.

>
> The only option to make uploads faster is if the Origin server is aware that the client is using a reverse proxy cache and respond to the upload request before the full payload transfer.
>

Close, bit not quite. The server does not need to know about the proxy, it just has to know the upload payload is "pointless waste of bandwidth"
  (where data loss don't matter) and deliver its response early.

For example; this is usually seen with NTLM authentication, where uploads without credentials are denied early. Because the upload has to be repeated in full with the right credentials and all the bytes from the first attempt can be dropped in-transit by the proxy.


> Tell me if I'm wrong, but I think that I understand now.
> Meaning that if I want to "bufferize" the writes it has to happen with another protocol before the WebDAV connection to Sharepoint Online.
>

The "other protocol" is WebDAV as far as I know. HTTP is just about delivery of some request and its corresponding response. How WebDAV transfers use HTTP messaging, and which parts of HTTP and WebDAV the client and server implement may or may not support the behaviour you want.


You are then colliding with the definition differences between "cache"
and "buffer". Caches store *past* data for the purpose of reducing current/future server work, buffers store *current* data awaiting delivery.
  An upload is normally not something seen previously, so not cacheable.

Proxies and the network itself *do* buffer data along the way. But that in no way adds any asynchronous properties to HTTP. The client still has to wait for the HTTP response to be delivered back to it before it can consider the HTTP part of that transaction over - the "transaction" in this context may or may not be the full WebDAV upload+processing on the server.

HTTP has some mechanisms that can help improve upload behaviour and avoid pointless bandwidth delivery. Notably the Expect:100-continue and Range features and 201/202 status codes. WebDAV extensions to HTTP add various other things I'm not very familiar with.
  Between them they can signal to the client a server is a) contactable before data gets delivered, b) deliver it in small chunks to minimize loss, and c) that any given part has completed arrival and awaiting some state (ie full object arrival) and/or some async processing.


BUT, as should be obvious these are all application-logic level things (ie WebDAV) and require explicit support by both the endpoint applications on server and client for that logic to take place. The async properties arise from how things are done *between* HTTP transactions. The interactions are separate synchronous request+response message pairs as far as Squid and any HTTP infrastructure is concerned.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users