Squid reverse-proxy. How it decides when to refresh?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Squid reverse-proxy. How it decides when to refresh?

Alexander Lazarev
Hello guys!
I'm using squid as a reverse-proxy. And I can't understand how squid decides when to check for fresh version of file from origin server.
It looks like for some documents it sends 'If-Modified-Since' or similar headers and if it gets 304, it serves file from cache. And for some documents it doesn't check for fresh version and always serves from cache.
I was testing that with curl without any additional headers.
Can some explain how that works or where I can read about that in detail? And is it possible to make squid always check for fresh version before serving from cache?
Thanks!
Alexander

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid reverse-proxy. How it decides when to refresh?

Amos Jeffries
Administrator
On 26/08/17 00:37, Alexander Lazarev wrote:
> Hello guys!
> I'm using squid as a reverse-proxy. And I can't understand how squid
> decides when to check for fresh version of file from origin server.
> It looks like for some documents it sends 'If-Modified-Since' or similar
> headers and if it gets 304, it serves file from cache. And for some
> documents it doesn't check for fresh version and always serves from cache. > I was testing that with curl without any additional headers.
> Can some explain how that works or where I can read about that in
> detail?

The HTTP specification RFC 723x series was re-written to be a lot more
easily understood, so those are probably the best place to read up about it.

The features you are asking about are covered in:

Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
  <https://tools.ietf.org/html/rfc7232>

Hypertext Transfer Protocol (HTTP/1.1): Caching
  <https://tools.ietf.org/html/rfc7234>


> And is it possible to make squid always check for fresh version
> before serving from cache?

It does when needed. The situation may be clearer after reading the above.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid reverse-proxy. How it decides when to refresh?

Alexander Lazarev
Thank you for reply!
I still don't understand what's happening.
I create file 1.txt with a little bit of text data. Request it with curl. Web-server returns it without any cache related headers to squid, squid returns it to me. Getting it with curl one more time, squid serves it straight from cache without validation(no entries in log on origin server).
I create one more file 2.txt with some data. Do same things, same headers in response. Second response from squid is from cache but validated from origin server(i see 304 in origin server logs).
What could be wrong? 
I have thought maybe squid applying heuristic freshness, but i didn't see any warnings in headers. 
Maybe some sort of a bug?

On Fri, Aug 25, 2017 at 6:18 PM, Amos Jeffries <[hidden email]> wrote:
On 26/08/17 00:37, Alexander Lazarev wrote:
Hello guys!
I'm using squid as a reverse-proxy. And I can't understand how squid decides when to check for fresh version of file from origin server.
It looks like for some documents it sends 'If-Modified-Since' or similar headers and if it gets 304, it serves file from cache. And for some documents it doesn't check for fresh version and always serves from cache. > I was testing that with curl without any additional headers.
Can some explain how that works or where I can read about that in detail?

The HTTP specification RFC 723x series was re-written to be a lot more easily understood, so those are probably the best place to read up about it.

The features you are asking about are covered in:

Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
 <https://tools.ietf.org/html/rfc7232>

Hypertext Transfer Protocol (HTTP/1.1): Caching
 <https://tools.ietf.org/html/rfc7234>


And is it possible to make squid always check for fresh version before serving from cache?

It does when needed. The situation may be clearer after reading the above.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid reverse-proxy. How it decides when to refresh?

Alexander Lazarev
Well. looks like squid using heuristics after all:
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(291) refreshCheck: checking freshness of 'http://mydomain.zone/1.txt'
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(312) refreshCheck: Matched '<none> 0 20%% 259200'
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(314) refreshCheck:       age:    65955
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(316) refreshCheck:       check_time:     Fri, 01 Sep 2017 11:49:12 GMT
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(318) refreshCheck:       entry->timestamp:       Thu, 31 Aug 2017 17:29:57 GMT
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(179) refreshStaleness: No explicit expiry given, using heuristics to determine freshness
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(198) refreshStaleness: Last modified 5524975 sec before we cached it, L-M factor 20.00% = 1104995 sec freshness lifetime
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(205) refreshStaleness: FRESH: age 65955 <= stale_age 1104995
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(338) refreshCheck: Staleness = -1
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(461) refreshCheck: Object isn't stale..
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(470) refreshCheck: returning FRESH_LMFACTOR_RULE

It's a shame there's no warning header, like "https://tools.ietf.org/html/rfc7234#section-5.5.4" suggests.
Guess, I need to set refresh_pattern's max option to minimal value.

On Thu, Aug 31, 2017 at 8:26 PM, Alexander Lazarev <[hidden email]> wrote:
Thank you for reply!
I still don't understand what's happening.
I create file 1.txt with a little bit of text data. Request it with curl. Web-server returns it without any cache related headers to squid, squid returns it to me. Getting it with curl one more time, squid serves it straight from cache without validation(no entries in log on origin server).
I create one more file 2.txt with some data. Do same things, same headers in response. Second response from squid is from cache but validated from origin server(i see 304 in origin server logs).
What could be wrong? 
I have thought maybe squid applying heuristic freshness, but i didn't see any warnings in headers. 
Maybe some sort of a bug?

On Fri, Aug 25, 2017 at 6:18 PM, Amos Jeffries <[hidden email]> wrote:
On 26/08/17 00:37, Alexander Lazarev wrote:
Hello guys!
I'm using squid as a reverse-proxy. And I can't understand how squid decides when to check for fresh version of file from origin server.
It looks like for some documents it sends 'If-Modified-Since' or similar headers and if it gets 304, it serves file from cache. And for some documents it doesn't check for fresh version and always serves from cache. > I was testing that with curl without any additional headers.
Can some explain how that works or where I can read about that in detail?

The HTTP specification RFC 723x series was re-written to be a lot more easily understood, so those are probably the best place to read up about it.

The features you are asking about are covered in:

Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
 <https://tools.ietf.org/html/rfc7232>

Hypertext Transfer Protocol (HTTP/1.1): Caching
 <https://tools.ietf.org/html/rfc7234>


And is it possible to make squid always check for fresh version before serving from cache?

It does when needed. The situation may be clearer after reading the above.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users



_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid reverse-proxy. How it decides when to refresh?

Amos Jeffries
Administrator
On 02/09/17 00:18, Alexander Lazarev wrote:

> Well. looks like squid using heuristics after all:
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(291) refreshCheck:
> checking freshness of 'http://mydomain.zone/1.txt'
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(312) refreshCheck:
> Matched '<none> 0 20%% 259200'
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(314) refreshCheck:      
> age:    65955
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(316) refreshCheck:      
> check_time:     Fri, 01 Sep 2017 11:49:12 GMT
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(318) refreshCheck:      
> entry->timestamp:       Thu, 31 Aug 2017 17:29:57 GMT
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(179) refreshStaleness: No
> explicit expiry given, using heuristics to determine freshness
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(198) refreshStaleness:
> Last modified 5524975 sec before we cached it, L-M factor 20.00% =
> 1104995 sec freshness lifetime
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(205) refreshStaleness:
> FRESH: age 65955 <= stale_age 1104995
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(338) refreshCheck:
> Staleness = -1
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(461) refreshCheck: Object
> isn't stale..
> 2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(470) refreshCheck:
> returning FRESH_LMFACTOR_RULE
>
> It's a shame there's no warning header, like
> "https://tools.ietf.org/html/rfc7234#section-5.5.4" suggests.

There should be when that cached response becomes 24 hrs old. That log
says Squid only received the object ~18 hrs ago, so the cached
*response* has not been around for 24hrs yet even though the *content*
it refers to on the server is older.

Content on a server being old is no particular cause for alarm if it
gets cached a few seconds/hrs/mins on a proxy.

Though note that Warning headers about heuristics being used are an
OPTIONAL, so it is also not a problem if they are absent.

> Guess, I need to set refresh_pattern's max option to minimal value.
>

Any particular reason you are worried about all this?

Heuristic freshness is normal and usually perfectly fine. A proxy making
heuristic decisions is only a problem if it is ignoring server or client
instructions about the content cacheability. Also, a reverse-proxy as an
agent of the server effectively has permission to ignore things the
client wants - though it is usually a good idea to do a background
revalidation if the client insists strongly on new content (eg.
reload-into-ims option), because that tends to mean there is some
problem with what it got earlier [maybe whats in the cache].


> On Thu, Aug 31, 2017 at 8:26 PM, Alexander Lazarev wrote:
>
>     Thank you for reply!
>     I still don't understand what's happening.
>     I create file 1.txt with a little bit of text data. Request it with
>     curl. Web-server returns it without any cache related headers to
>     squid, squid returns it to me. Getting it with curl one more time,
>     squid serves it straight from cache without validation(no entries in
>     log on origin server).
>     I create one more file 2.txt with some data. Do same things, same
>     headers in response. Second response from squid is from cache but
>     validated from origin server(i see 304 in origin server logs).
>     What could be wrong?

Nothing wrong. Both sequences are valid and normal. The difference could
just be a timing variation as small as a nanosecond in what operations
are performed relative to each other - with heuristics based on 0.2 of a
recently created objects age HTTP's 1 second in timing resolution is
both very course and very sensitive to rounding limits.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid reverse-proxy. How it decides when to refresh?

Alexander Lazarev
It's all pretty clear to me now after I read RFC and found relationship between that and refresh_pattern usage.
Thank you.

On Fri, Sep 1, 2017 at 4:46 PM, Amos Jeffries <[hidden email]> wrote:
On 02/09/17 00:18, Alexander Lazarev wrote:
Well. looks like squid using heuristics after all:
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(291) refreshCheck: checking freshness of 'http://mydomain.zone/1.txt'
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(312) refreshCheck: Matched '<none> 0 20%% 259200'
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(314) refreshCheck:       age:    65955
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(316) refreshCheck:       check_time:     Fri, 01 Sep 2017 11:49:12 GMT
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(318) refreshCheck:       entry->timestamp:       Thu, 31 Aug 2017 17:29:57 GMT
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(179) refreshStaleness: No explicit expiry given, using heuristics to determine freshness
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(198) refreshStaleness: Last modified 5524975 sec before we cached it, L-M factor 20.00% = 1104995 sec freshness lifetime
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(205) refreshStaleness: FRESH: age 65955 <= stale_age 1104995
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(338) refreshCheck: Staleness = -1
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(461) refreshCheck: Object isn't stale..
2017/09/01 14:49:12.296 kid2| 22,3| refresh.cc(470) refreshCheck: returning FRESH_LMFACTOR_RULE

It's a shame there's no warning header, like "https://tools.ietf.org/html/rfc7234#section-5.5.4" suggests.

There should be when that cached response becomes 24 hrs old. That log says Squid only received the object ~18 hrs ago, so the cached *response* has not been around for 24hrs yet even though the *content* it refers to on the server is older.

Content on a server being old is no particular cause for alarm if it gets cached a few seconds/hrs/mins on a proxy.

Though note that Warning headers about heuristics being used are an OPTIONAL, so it is also not a problem if they are absent.

Guess, I need to set refresh_pattern's max option to minimal value.


Any particular reason you are worried about all this?

Heuristic freshness is normal and usually perfectly fine. A proxy making heuristic decisions is only a problem if it is ignoring server or client instructions about the content cacheability. Also, a reverse-proxy as an agent of the server effectively has permission to ignore things the client wants - though it is usually a good idea to do a background revalidation if the client insists strongly on new content (eg. reload-into-ims option), because that tends to mean there is some problem with what it got earlier [maybe whats in the cache].


On Thu, Aug 31, 2017 at 8:26 PM, Alexander Lazarev wrote:

    Thank you for reply!
    I still don't understand what's happening.
    I create file 1.txt with a little bit of text data. Request it with
    curl. Web-server returns it without any cache related headers to
    squid, squid returns it to me. Getting it with curl one more time,
    squid serves it straight from cache without validation(no entries in
    log on origin server).
    I create one more file 2.txt with some data. Do same things, same
    headers in response. Second response from squid is from cache but
    validated from origin server(i see 304 in origin server logs).
    What could be wrong?

Nothing wrong. Both sequences are valid and normal. The difference could just be a timing variation as small as a nanosecond in what operations are performed relative to each other - with heuristics based on 0.2 of a recently created objects age HTTP's 1 second in timing resolution is both very course and very sensitive to rounding limits.


Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users