Not all html objects are being cached

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Not all html objects are being cached

boruc
Hi everyone,

I was wondering why some of visited pages are not being cached (I mean "main" pages, like www.example.com). If I visit 50 pages only 10 will be cached. Below text is from log files:

store.log:
1485272001.646 RELEASE -1 FFFFFFFF 04F7FA9EAA7FE3D531A2224F4C7DDE5A  200 1485272011        -1 375007920 text/html -1/222442 GET http://www.wykop.pl/

access.log
1485272001.646    423 10.10.10.136 TCP_MISS/200 223422 GET http://www.wykop.pl/ - DIRECT/185.66.120.38 text/html

According to Squid Wiki: "if a RELEASE code was logged with file number FFFFFFFF, the object existed only in memory, and was released from memory." - I understand that requested html file wasn't saved to disk, but why?

I'm also posting my squid.conf below. I'd be grateful for your answers!


acl manager proto cache_object
acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 ::1
acl my_network src 192.168.0.0/24
acl my_phone src 192.168.54.0/24
acl my_net dst 192.168.0.0/24
acl mgr src 10.48.5.0/24
acl new_net src 10.10.10.0/24
acl ex_ft url_regex -i "/etc/squid3/excluded_filetypes.txt"
acl ex_do url_regex -i "/etc/squid3/excluded_domains.txt" #doesnt include any of 50 visited pages

acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

http_access allow my_network
http_access allow my_phone
http_access allow my_net
http_access allow mgr
http_access allow new_net
http_access allow manager localhost
http_access deny manager

http_access deny !Safe_ports

http_access deny CONNECT !SSL_ports

http_access allow localhost
http_access allow all

http_port 3128

maximum_object_size_in_memory 1024 KB

cache_dir ufs /var/spool/squid3 1000 16 256

cache_store_log /var/log/squid3/store.log

coredump_dir /var/spool/squid3

cache deny ex_ft
cache deny ex_do

refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern (Release|Packages(.gz)*)$      0       20%     2880

refresh_pattern .               1000       20%     4320

request_header_access Accept-Encoding deny all
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

boruc
After a little bit of analyzing requests and responses with WireShark I noticed that many sites that weren't cached had different combination of below parameters:

Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check, private, public, max-age, public
Pragma: no-cache

There is a possibility to disable this in squid by using request_header_access and reply_header_access, however it doesn't work for me, many pages aren't still in cache. I am currently using lines below:

request_header_access Cache-Control deny all
request_header_access Pragma deny all
request_header_access Accept-Encoding deny all
reply_header_access Cache-Control deny all
reply_header_access Pragma deny all
reply_header_access Accept-Encoding deny all

I could also try refresh_pattern, but I don't think that code below will work because not every URL ends with .html or .htm (because you visit www.example.com, not www.example.com/index.html)
refresh_pattern -i \.(html|htm)$          1440   40% 40320 ignore-no-cache ignore-no-store ignore-private override-expire reload-into-ims

Thank you in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov


26.01.2017 2:22, boruc пишет:
> After a little bit of analyzing requests and responses with WireShark I
> noticed that many sites that weren't cached had different combination of
> below parameters:
>
> Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
> private, public, max-age, public
> Pragma: no-cache
If the webmaster has done this - he had good reason to. Trying to break
the RFC in this way, you break the Internet.
>
> There is a possibility to disable this in squid by using
Don't do it.

> request_header_access and reply_header_access, however it doesn't work for
> me, many pages aren't still in cache. I am currently using lines below:
>
> request_header_access Cache-Control deny all
> request_header_access Pragma deny all
> request_header_access Accept-Encoding deny all
> reply_header_access Cache-Control deny all
> reply_header_access Pragma deny all
> reply_header_access Accept-Encoding deny all
>
> I could also try refresh_pattern, but I don't think that code below will
> work because not every URL ends with .html or .htm (because you visit
> /www.example.com/, not /www.example.com/index.html/)
> refresh_pattern -i \.(html|htm)$          1440   40% 40320 ignore-no-cache
> ignore-no-store ignore-private override-expire reload-into-ims
>
> Thank you in advance.
You're welcome.

>
>
>
> --
> View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293p4681326.html
> Sent from the Squid - Users mailing list archive at Nabble.com.
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x613DEC46.asc (2K) Download Attachment
signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Amos Jeffries
Administrator
On 26/01/2017 9:44 a.m., Yuri Voinov wrote:

>
>
> 26.01.2017 2:22, boruc пишет:
>> After a little bit of analyzing requests and responses with WireShark I
>> noticed that many sites that weren't cached had different combination of
>> below parameters:
>>
>> Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
>> private, public, max-age, public
>> Pragma: no-cache
> If the webmaster has done this - he had good reason to. Trying to break
> the RFC in this way, you break the Internet.

Instead use the latest Squid you can. Squid by default caches as much as
it can within the restrictions imposed by the web environment. But
'latest is best' etc. since we are still working on support for HTTP/1.1
features.


I recommend you use the tool at <http://redbot.org> to check URLs
cacheability instead of wireshark. It will tell you what those controls
actually *mean* in regards to cacheability, not just that they are used.
And whether there are other problems you may not have noticed in the
various different ways there are to fetch any given URL.


The Squid options available are mostly for disabling some caching
operation - so that if you are in a situation where disabling operation
X causes operation Y to cache better you can tune the behaviour.

You can't really *force* things which are not cacheable to be stored.
They will just be replaced with a newer copy shortly after with no
benefit gained - just some possibly nasty side effects, or real monetary
costs.


>>
>> There is a possibility to disable this in squid by using
> Don't do it.
>> request_header_access and reply_header_access, however it doesn't work for
>> me, many pages aren't still in cache. I am currently using lines below:
>>
>> request_header_access Cache-Control deny all
>> request_header_access Pragma deny all
>> request_header_access Accept-Encoding deny all
>> reply_header_access Cache-Control deny all
>> reply_header_access Pragma deny all
>> reply_header_access Accept-Encoding deny all
>>

Ah, changing the headers on the *outgoing* traffic does not in any way
affect how Squid interprets the _previously_ received inbound messages.

==> In other words; doing the above is pointless and screws everybody
using your proxy over. Dont do that.


By erasing the Cache-Controls response header delivered along with that
content you are technically in violation of International copyright laws.
==> Dont do that.


By removing the Accept-Encoding on requests (only) you can improve HIT
ratio (only a small amount), but at cost of 50-90% bandwidth increase on
each MISS - so the cost increase usually swamps the gains.

==> Making this change lead to the opposite of what you intended. Dont
do that.


Removing the Accept-Encoding header on responses. Is just pointless. It
controls POST/PUT payload data, which Squid cannot cache anyway. So all
you did was prevent the clients using less bandwidth.

==> More bandwidth, more costs. Dont do that.


Removing the Pragma header is also pointless. It's used by very ancient
software from the 1990's and such.

==> if the web application was actually using the Pragma for anything
important (some do) you just screwed them over with no gains to
yourself. Dont do that.


>> I could also try refresh_pattern, but I don't think that code below will
>> work because not every URL ends with .html or .htm (because you visit
>> /www.example.com/, not /www.example.com/index.html/)
>> refresh_pattern -i \.(html|htm)$          1440   40% 40320 ignore-no-cache
>> ignore-no-store ignore-private override-expire reload-into-ims
>>


Quite. So configure the correct options.

No software is psychic enough to do operation X which you want, when you
configure it to do *only* some other non-X operation.


Amos

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Matus UHLAR - fantomas
In reply to this post by Yuri Voinov
>26.01.2017 2:22, boruc пишет:
>> After a little bit of analyzing requests and responses with WireShark I
>> noticed that many sites that weren't cached had different combination of
>> below parameters:
>>
>> Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
>> private, public, max-age, public
>> Pragma: no-cache

On 26.01.17 02:44, Yuri Voinov wrote:
>If the webmaster has done this - he had good reason to. Trying to break
>the RFC in this way, you break the Internet.

Actually, no. If the webmaster has done the above - he has no damn idea what
those mean (private and public?) , and how to provide properly cacheable
content.

Which is very common and also a reason why many proxy admins tend to ignore
those controls...

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
There's a long-standing bug relating to the x86 architecture that
allows you to install Windows.   -- Matthew D. Fuller
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov


27.01.2017 2:44, Matus UHLAR - fantomas пишет:

>> 26.01.2017 2:22, boruc пишет:
>>> After a little bit of analyzing requests and responses with WireShark I
>>> noticed that many sites that weren't cached had different
>>> combination of
>>> below parameters:
>>>
>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>> pre-check,
>>> private, public, max-age, public
>>> Pragma: no-cache
>
> On 26.01.17 02:44, Yuri Voinov wrote:
>> If the webmaster has done this - he had good reason to. Trying to break
>> the RFC in this way, you break the Internet.
>
> Actually, no. If the webmaster has done the above - he has no damn
> idea what
> those mean (private and public?) , and how to provide properly cacheable
> content.
It was sarcasm.
>
> Which is very common and also a reason why many proxy admins tend to
> ignore
> those controls...
>

--
Bugs to the Future

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x613DEC46.asc (2K) Download Attachment
signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

reinerotto
This post was updated on .
In reply to this post by Amos Jeffries
>reply_header_access Cache-Control deny all<
Will this only affect downstream caches, or will this squid itself also ignore any Cache-Control header info
received from upstream ?
In case, only effective for downstream caches, then 2 squids in a chain should do the trick.
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Amos Jeffries
Administrator
On 27/01/2017 11:08 a.m., reinerotto wrote:
>> reply_header_access Cache-Control deny all<
> Will this only affect downstream caches, or will this squid itself also
> ignore any Cache-Control header info
> received from upstream ?
>

It will only affect the clients caches. eg. their browser cache.

Amos

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Amos Jeffries
Administrator
In reply to this post by Matus UHLAR - fantomas
On 27/01/2017 9:44 a.m., Matus UHLAR - fantomas wrote:

>> 26.01.2017 2:22, boruc пишет:
>>> After a little bit of analyzing requests and responses with WireShark I
>>> noticed that many sites that weren't cached had different combination of
>>> below parameters:
>>>
>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>> pre-check,
>>> private, public, max-age, public
>>> Pragma: no-cache
>
> On 26.01.17 02:44, Yuri Voinov wrote:
>> If the webmaster has done this - he had good reason to. Trying to break
>> the RFC in this way, you break the Internet.
>
> Actually, no. If the webmaster has done the above - he has no damn idea
> what
> those mean (private and public?) , and how to provide properly cacheable
> content.
>


I think boruc has just listed all the cache controls he has noticed in
one line. Not actually what is being seen ...


> Which is very common and also a reason why many proxy admins tend to ignore
> those controls...
>

... the URLs used for expanded details show the usual combos webmasters
use to 'fix' broken behaviour of such proxies. For example adding
"no-cache, private, max-age=0" to get around proxies ignoring various of
the controls.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Amos Jeffries
Administrator
In reply to this post by Yuri Voinov
On 27/01/2017 9:46 a.m., Yuri Voinov wrote:

>
>
> 27.01.2017 2:44, Matus UHLAR - fantomas пишет:
>>> 26.01.2017 2:22, boruc пишет:
>>>> After a little bit of analyzing requests and responses with WireShark I
>>>> noticed that many sites that weren't cached had different
>>>> combination of
>>>> below parameters:
>>>>
>>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>>> pre-check,
>>>> private, public, max-age, public
>>>> Pragma: no-cache
>>
>> On 26.01.17 02:44, Yuri Voinov wrote:
>>> If the webmaster has done this - he had good reason to. Trying to break
>>> the RFC in this way, you break the Internet.
>>
>> Actually, no. If the webmaster has done the above - he has no damn
>> idea what
>> those mean (private and public?) , and how to provide properly cacheable
>> content.
> It was sarcasm.


You may have intended it to be. But you spoke the simple truth.

Other than 'public' there really are situations which have "good reason"
to send that set of controls all at once.

For example; any admin who wants a RESTful or SaaS application to
actually work for all their potential customers.


I have been watching the below cycle take place for the past 20 years in
HTTP:

Webmaster: dont cache this please.

  "Cache-Control: no-store"

Proxy Admin: ignore-no-store


Webmaster: I meant it. Dont deliver anything you cached without fetching
a updated version.

  ... "no-store, no-cache"

Proxy Admin: ignore-no-cache


Webmaster: really you MUST revalidate before using ths data.

 ... "no-store, no-cache, must-revalidate"

Proxy Admin: ignore-must-revalidate


Webmaster: Really I meant it. This is non-storable PRIVATE DATA!

... "no-store, no-cache, must-revalidate, private"

Proxy Admin: ignore-private


Webmaster: Seriously. I'm changing it on EVERY request! dont store it.

... "no-store, no-cache, must-revalidate, private, max-age=0"
"Expires: -1"

Proxy Admin: ignore-expires


Webmaster: are you one of those dumb HTTP/1.0 proxies who dont
understand Cache-Control?

"Pragma: no-cache"
"Expires: 1 Jan 1970"

Proxy Admin: hehe! I already ignore-no-cache ignore-expires


Webmaster: F*U!  May your clients batch up their traffic to slam you
with it all at once!

... "no-store, no-cache, must-revalidate, private, max-age=0,
pre-check=1, post-check=1"


Proxy Admin: My bandwidth! I need to cache more!

Webmaster: Doh! Oh well, so I have to write my application to force new
content then.

Proxy Admin: ignore-reload


Webmaster: Now What? Oh HTTPS wont have any damn proxies in the way....

... the cycle repeats again within HTTPS. Took all of 5 years this time.

... the cycle repeats again within SPDY. That took only ~1 year.

... the cycle repeats again within CoAP. The standards are not even
finished yet and its underway.


Stop this cycle of stupidity. It really HAS "broken the Internet".


HTH
Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov


27.01.2017 9:10, Amos Jeffries пишет:

> On 27/01/2017 9:46 a.m., Yuri Voinov wrote:
>>
>> 27.01.2017 2:44, Matus UHLAR - fantomas пишет:
>>>> 26.01.2017 2:22, boruc пишет:
>>>>> After a little bit of analyzing requests and responses with WireShark I
>>>>> noticed that many sites that weren't cached had different
>>>>> combination of
>>>>> below parameters:
>>>>>
>>>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>>>> pre-check,
>>>>> private, public, max-age, public
>>>>> Pragma: no-cache
>>> On 26.01.17 02:44, Yuri Voinov wrote:
>>>> If the webmaster has done this - he had good reason to. Trying to break
>>>> the RFC in this way, you break the Internet.
>>> Actually, no. If the webmaster has done the above - he has no damn
>>> idea what
>>> those mean (private and public?) , and how to provide properly cacheable
>>> content.
>> It was sarcasm.
>
> You may have intended it to be. But you spoke the simple truth.
>
> Other than 'public' there really are situations which have "good reason"
> to send that set of controls all at once.
>
> For example; any admin who wants a RESTful or SaaS application to
> actually work for all their potential customers.
>
>
> I have been watching the below cycle take place for the past 20 years in
> HTTP:
>
> Webmaster: dont cache this please.
>
>    "Cache-Control: no-store"
>
> Proxy Admin: ignore-no-store
>
>
> Webmaster: I meant it. Dont deliver anything you cached without fetching
> a updated version.
>
>    ... "no-store, no-cache"
>
> Proxy Admin: ignore-no-cache
>
>
> Webmaster: really you MUST revalidate before using ths data.
>
>   ... "no-store, no-cache, must-revalidate"
>
> Proxy Admin: ignore-must-revalidate
>
>
> Webmaster: Really I meant it. This is non-storable PRIVATE DATA!
>
> ... "no-store, no-cache, must-revalidate, private"
>
> Proxy Admin: ignore-private
>
>
> Webmaster: Seriously. I'm changing it on EVERY request! dont store it.
>
> ... "no-store, no-cache, must-revalidate, private, max-age=0"
> "Expires: -1"
>
> Proxy Admin: ignore-expires
>
>
> Webmaster: are you one of those dumb HTTP/1.0 proxies who dont
> understand Cache-Control?
>
> "Pragma: no-cache"
> "Expires: 1 Jan 1970"
>
> Proxy Admin: hehe! I already ignore-no-cache ignore-expires
>
>
> Webmaster: F*U!  May your clients batch up their traffic to slam you
> with it all at once!
>
> ... "no-store, no-cache, must-revalidate, private, max-age=0,
> pre-check=1, post-check=1"
>
>
> Proxy Admin: My bandwidth! I need to cache more!
>
> Webmaster: Doh! Oh well, so I have to write my application to force new
> content then.
>
> Proxy Admin: ignore-reload
>
>
> Webmaster: Now What? Oh HTTPS wont have any damn proxies in the way....
>
> ... the cycle repeats again within HTTPS. Took all of 5 years this time.
>
> ... the cycle repeats again within SPDY. That took only ~1 year.
>
> ... the cycle repeats again within CoAP. The standards are not even
> finished yet and its underway.
>
>
> Stop this cycle of stupidity. It really HAS "broken the Internet".
All that would be just great if a webmaster was conscientious. I will
give just one example.

Only one example.

root @ khorne /patch # wget -S http://www.microsoft.com
--2017-01-27 15:29:54--  http://www.microsoft.com/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
   HTTP/1.1 302 Found
   Server: AkamaiGHost
   Content-Length: 0
   Location: http://www.microsoft.com/ru-kz/
   Date: Fri, 27 Jan 2017 09:29:54 GMT
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Location: http://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54--  http://www.microsoft.com/ru-kz/
Reusing existing connection to 127.0.0.1:3128.
Proxy request sent, awaiting response...
   HTTP/1.1 301 Moved Permanently
   Server: AkamaiGHost
   Content-Length: 0
   Location: https://www.microsoft.com/ru-kz/
   Date: Fri, 27 Jan 2017 09:29:54 GMT
   Set-Cookie:
akacd_OneRF=1493285394~rv=7~id=6a2316770abdbb58a85c16676a0f84fd; path=/;
Expires=Thu, 27 Apr 2017 09:29:54 GMT
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Location: https://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
   HTTP/1.1 200 OK
   Cache-Control: no-cache, no-store
   Pragma: no-cache
   Content-Type: text/html
   Expires: -1
   Server: Microsoft-IIS/8.0
   CorrelationVector: BzssVwiBIUaXqyOh.1.1
   X-AspNet-Version: 4.0.30319
   X-Powered-By: ASP.NET
   Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type,
Accept
   Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
   Access-Control-Allow-Credentials: true
   P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
   X-Frame-Options: SAMEORIGIN
   Vary: Accept-Encoding
   Content-Encoding: gzip
   Date: Fri, 27 Jan 2017 09:29:56 GMT
   Content-Length: 13322
   Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
   Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
   Strict-Transport-Security: max-age=0; includeSubDomains
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Length: 13322 (13K) [text/html]
Saving to: 'index.html'

index.html          100%[==================>]  13.01K --.-KB/s    in 0s

2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]

Can you explain me - for what static index.html has this:

Cache-Control: no-cache, no-store
Pragma: no-cache

?

What can be broken to ignore CC in this page?


Yes, saving traffic is the most important, because not all and not
everywhere has terabit links with unlimited calling. Moreover, the
number of users increases and the capacity is finite. In any case, the
decision on how to deal with the content in such a situation should
remain behind the proxy administrator. And not for the developers of
this proxy, which is hardcoded own vision, even with RFC. Because the
byte-hit 10% (vanilla Squid, after very hadr work it will be up to 30%,
but no more) - this is ridiculous. In such a situation would be more
honest nothing at all cache - only let's not say that the squid - a
caching proxy. Set the path of the secondary server that requires a lot
of attention, despite the fact that it gives a gain only 10% - a mockery
of users.

Let me explain the situation as I see it. Webmaster hanging everywhere
ban caching in any way possible, because on its pages full of
advertising. For that pays money. This is the same reason that Google
prevents caching Youtube. Big money. We do not get the money, in fact,
our goal - to minimize the costs of traffic. We choose Squid as a tool.
And you, with your point of view, deprived us of weapons against
unscrupulous webmasters. So it looks.

Again. Breaking the Internet - it should be my choice, not yours. Or
follow the RFC at 100% - or do not have to break it in part. You either
wear pants or remove the cross, as they say.
>
>
> HTH
> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Garri Djavadyan
On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:

> --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
> Connecting to 127.0.0.1:3128... connected.
> Proxy request sent, awaiting response...
>    HTTP/1.1 200 OK
>    Cache-Control: no-cache, no-store
>    Pragma: no-cache
>    Content-Type: text/html
>    Expires: -1
>    Server: Microsoft-IIS/8.0
>    CorrelationVector: BzssVwiBIUaXqyOh.1.1
>    X-AspNet-Version: 4.0.30319
>    X-Powered-By: ASP.NET
>    Access-Control-Allow-Headers: Origin, X-Requested-With, Content-
> Type, 
> Accept
>    Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
>    Access-Control-Allow-Credentials: true
>    P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI
> TELo 
> OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
>    X-Frame-Options: SAMEORIGIN
>    Vary: Accept-Encoding
>    Content-Encoding: gzip
>    Date: Fri, 27 Jan 2017 09:29:56 GMT
>    Content-Length: 13322
>    Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com; 
> expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
>    Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com; 
> expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
>    Strict-Transport-Security: max-age=0; includeSubDomains
>    X-CCC: NL
>    X-CID: 2
>    X-Cache: MISS from khorne
>    X-Cache-Lookup: MISS from khorne:3128
>    Connection: keep-alive
> Length: 13322 (13K) [text/html]
> Saving to: 'index.html'
>
> index.html          100%[==================>]  13.01K --.-KB/s    in
> 0s
>
> 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]
>
> Can you explain me - for what static index.html has this:
>
> Cache-Control: no-cache, no-store
> Pragma: no-cache
>
> ?
>
> What can be broken to ignore CC in this page?

Hi Yuri,


Why do you think the page returned for URL [https://www.microsoft.com/r
u-kz/] is static and not dynamically generated one?

The index.html file is default file name for wget.

man wget:
  --default-page=name
       Use name as the default file name when it isn't known (i.e., for
       URLs that end in a slash), instead of index.html.

In fact the https://www.microsoft.com/ru-kz/index.html is a stub page
(The page you requested cannot be found.).


Garri
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov


27.01.2017 17:54, Garri Djavadyan пишет:

> On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
>> --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
>> Connecting to 127.0.0.1:3128... connected.
>> Proxy request sent, awaiting response...
>>     HTTP/1.1 200 OK
>>     Cache-Control: no-cache, no-store
>>     Pragma: no-cache
>>     Content-Type: text/html
>>     Expires: -1
>>     Server: Microsoft-IIS/8.0
>>     CorrelationVector: BzssVwiBIUaXqyOh.1.1
>>     X-AspNet-Version: 4.0.30319
>>     X-Powered-By: ASP.NET
>>     Access-Control-Allow-Headers: Origin, X-Requested-With, Content-
>> Type,
>> Accept
>>     Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
>>     Access-Control-Allow-Credentials: true
>>     P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI
>> TELo
>> OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
>>     X-Frame-Options: SAMEORIGIN
>>     Vary: Accept-Encoding
>>     Content-Encoding: gzip
>>     Date: Fri, 27 Jan 2017 09:29:56 GMT
>>     Content-Length: 13322
>>     Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
>> expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
>>     Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
>> expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
>>     Strict-Transport-Security: max-age=0; includeSubDomains
>>     X-CCC: NL
>>     X-CID: 2
>>     X-Cache: MISS from khorne
>>     X-Cache-Lookup: MISS from khorne:3128
>>     Connection: keep-alive
>> Length: 13322 (13K) [text/html]
>> Saving to: 'index.html'
>>
>> index.html          100%[==================>]  13.01K --.-KB/s    in
>> 0s
>>
>> 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]
>>
>> Can you explain me - for what static index.html has this:
>>
>> Cache-Control: no-cache, no-store
>> Pragma: no-cache
>>
>> ?
>>
>> What can be broken to ignore CC in this page?
> Hi Yuri,
>
>
> Why do you think the page returned for URL [https://www.microsoft.com/r
> u-kz/] is static and not dynamically generated one?
And for me, what's the difference? Does it change anything? In addition,
it is easy to see on the page and even the eyes - strangely enough - to
open its code. And? What do you see there?
>
> The index.html file is default file name for wget.
And also the name of the default home page in the web. Imagine - I know
the obvious things. But the question was about something else.
>
> man wget:
>    --default-page=name
>         Use name as the default file name when it isn't known (i.e., for
>         URLs that end in a slash), instead of index.html.
>
> In fact the https://www.microsoft.com/ru-kz/index.html is a stub page
> (The page you requested cannot be found.).
You living in wrong region. This is geo-dependent page, as obvious, yes?

Again. What is the difference? I open it from different workstations,
from different browsers - I see the same thing. The code is identical. I
can is to cache? Yes or no?
>
>
> Garri
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov
In reply to this post by Garri Djavadyan

I understand that I want to conclusively prove its case. But for the sake of objectivity - dynamically generated only dynamic pages? Maybe the solution is still the administrator to leave? If I see that something is broken or users complain about me - directive cache deny already canceled?


27.01.2017 17:54, Garri Djavadyan пишет:
On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
--2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
   HTTP/1.1 200 OK
   Cache-Control: no-cache, no-store
   Pragma: no-cache
   Content-Type: text/html
   Expires: -1
   Server: Microsoft-IIS/8.0
   CorrelationVector: BzssVwiBIUaXqyOh.1.1
   X-AspNet-Version: 4.0.30319
   X-Powered-By: ASP.NET
   Access-Control-Allow-Headers: Origin, X-Requested-With, Content-
Type, 
Accept
   Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
   Access-Control-Allow-Credentials: true
   P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI
TELo 
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
   X-Frame-Options: SAMEORIGIN
   Vary: Accept-Encoding
   Content-Encoding: gzip
   Date: Fri, 27 Jan 2017 09:29:56 GMT
   Content-Length: 13322
   Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com; 
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
   Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com; 
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
   Strict-Transport-Security: max-age=0; includeSubDomains
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Length: 13322 (13K) [text/html]
Saving to: 'index.html'

index.html          100%[==================>]  13.01K --.-KB/s    in
0s

2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]

Can you explain me - for what static index.html has this:

Cache-Control: no-cache, no-store
Pragma: no-cache

?

What can be broken to ignore CC in this page?
Hi Yuri,


Why do you think the page returned for URL [https://www.microsoft.com/r
u-kz/] is static and not dynamically generated one?

The index.html file is default file name for wget.

man wget:
  --default-page=name
       Use name as the default file name when it isn't known (i.e., for
       URLs that end in a slash), instead of index.html.

In fact the https://www.microsoft.com/ru-kz/index.html is a stub page
(The page you requested cannot be found.).


Garri
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Antony Stone
In reply to this post by Yuri Voinov
On Friday 27 January 2017 at 12:58:52, Yuri wrote:

> Again. What is the difference? I open it from different workstations,
> from different browsers - I see the same thing. The code is identical. I
> can is to cache? Yes or no?

You're entitled to do whatever you want to, following standards and
recommendations or not - just don't complain when choosing not to follow those
standards and recommendations results in behaviour different from what you
wanted (or what someone else intended).

Oh, and by the way, what did you mean earlier when you said:

> You either wear pants or remove the cross, as they say.

        ?


Antony.

--
"640 kilobytes (of RAM) should be enough for anybody."

 - Bill Gates

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov


27.01.2017 18:05, Antony Stone пишет:
> On Friday 27 January 2017 at 12:58:52, Yuri wrote:
>
>> Again. What is the difference? I open it from different workstations,
>> from different browsers - I see the same thing. The code is identical. I
>> can is to cache? Yes or no?
> You're entitled to do whatever you want to, following standards and
> recommendations or not - just don't complain when choosing not to follow those
> standards and recommendations results in behaviour different from what you
> wanted (or what someone else intended).
All this crazy debate reminds me of Microsoft Windows. Windows is better
to know why the administrator should not have full access. Windows is
better to know how to work. Windows is better to know how to tell the
system administrator so that he called the system administrator.

Antonio, you've seen at least once, so I complained about the
consequences of my own actions?

>
> Oh, and by the way, what did you mean earlier when you said:
>
>> You either wear pants or remove the cross, as they say.
> ?
This is the end of a good Russian joke about a priest who had sex.
I meant that we should ever stop having sex - or remove the pectoral
cross. This is to ensure that the need to be consistent.
>
>
> Antony.
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Antony Stone
On Friday 27 January 2017 at 13:15:21, Yuri wrote:

> 27.01.2017 18:05, Antony Stone пишет:
>
> > You're entitled to do whatever you want to, following standards and
> > recommendations or not - just don't complain when choosing not to follow
> > those standards and recommendations results in behaviour different from
> > what you wanted (or what someone else intended).
>
> All this crazy debate reminds me of Microsoft Windows. Windows is better
> to know why the administrator should not have full access. Windows is
> better to know how to work. Windows is better to know how to tell the
> system administrator so that he called the system administrator.

That should remind you of OS X and Android as well, at the very least (and
quite possibly systemd as well)

My opinion is that it's your choice whether to run Microsoft Windows (or Apple
OS X, or Google Android) or not - but you have to accept it as a whole
package; you can't say "I want some of the neat features, but I want them to
work *my* way".

If you don't accept all aspects of the package, then don't use it.

> Antonio, you've seen at least once, so I complained about the
> consequences of my own actions?

You seem to continually complain that people are recommending not to try going
against standards, or trying to defeat the anti-caching directives on websites
you find.

It's your choice to try doing that; people are saying "but if you do that, bad
things will happen, or things will break, or it just won't work the way you
want it to", and then you say "but I don't like having to follow the rules".

That's what I meant about complaining about the consequences of your actions.


Antony.

--
"Life is just a lot better if you feel you're having 10 [small] wins a day
rather than a [big] win every 10 years or so."

 - Chris Hadfield, former skiing (and ski racing) instructor

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Yuri Voinov


27.01.2017 18:25, Antony Stone пишет:

> On Friday 27 January 2017 at 13:15:21, Yuri wrote:
>
>> 27.01.2017 18:05, Antony Stone пишет:
>>
>>> You're entitled to do whatever you want to, following standards and
>>> recommendations or not - just don't complain when choosing not to follow
>>> those standards and recommendations results in behaviour different from
>>> what you wanted (or what someone else intended).
>> All this crazy debate reminds me of Microsoft Windows. Windows is better
>> to know why the administrator should not have full access. Windows is
>> better to know how to work. Windows is better to know how to tell the
>> system administrator so that he called the system administrator.
> That should remind you of OS X and Android as well, at the very least (and
> quite possibly systemd as well)
>
> My opinion is that it's your choice whether to run Microsoft Windows (or Apple
> OS X, or Google Android) or not - but you have to accept it as a whole
> package; you can't say "I want some of the neat features, but I want them to
> work *my* way".
>
> If you don't accept all aspects of the package, then don't use it.
I just want to have a choice and an opportunity to say - "F*ck you, man,
I'm the System Administrator".

If you do not want to violate the RFC - remove violations HTTP at all.
If you remember, this mode is now enabled by default.

You do not have to teach me that I use. I - an administrator and wish to
be able to select tools. And do not be in a situation where the choice
is made for me.


>
>> Antonio, you've seen at least once, so I complained about the
>> consequences of my own actions?
> You seem to continually complain that people are recommending not to try going
> against standards, or trying to defeat the anti-caching directives on websites
> you find.
>
> It's your choice to try doing that; people are saying "but if you do that, bad
> things will happen, or things will break, or it just won't work the way you
> want it to", and then you say "but I don't like having to follow the rules".
>
> That's what I meant about complaining about the consequences of your actions.
It is my right and my choice. Personally, I do not complain of the
consequences, having enough tools to solve any problem.

Enough to learn me. Op asked why he did not cached static html. That
explains to him that in fact there live dragons and why he is wrong in
desires to cache any and all.
>
>
> Antony.
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

Garri Djavadyan
In reply to this post by Yuri Voinov
On Fri, 2017-01-27 at 17:58 +0600, Yuri wrote:

>
> 27.01.2017 17:54, Garri Djavadyan пишет:
> > On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
> > > --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
> > > Connecting to 127.0.0.1:3128... connected.
> > > Proxy request sent, awaiting response...
> > >     HTTP/1.1 200 OK
> > >     Cache-Control: no-cache, no-store
> > >     Pragma: no-cache
> > >     Content-Type: text/html
> > >     Expires: -1
> > >     Server: Microsoft-IIS/8.0
> > >     CorrelationVector: BzssVwiBIUaXqyOh.1.1
> > >     X-AspNet-Version: 4.0.30319
> > >     X-Powered-By: ASP.NET
> > >     Access-Control-Allow-Headers: Origin, X-Requested-With,
> > > Content-
> > > Type,
> > > Accept
> > >     Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
> > >     Access-Control-Allow-Credentials: true
> > >     P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD
> > > TAI
> > > TELo
> > > OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
> > >     X-Frame-Options: SAMEORIGIN
> > >     Vary: Accept-Encoding
> > >     Content-Encoding: gzip
> > >     Date: Fri, 27 Jan 2017 09:29:56 GMT
> > >     Content-Length: 13322
> > >     Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
> > > expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
> > >     Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
> > > expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
> > >     Strict-Transport-Security: max-age=0; includeSubDomains
> > >     X-CCC: NL
> > >     X-CID: 2
> > >     X-Cache: MISS from khorne
> > >     X-Cache-Lookup: MISS from khorne:3128
> > >     Connection: keep-alive
> > > Length: 13322 (13K) [text/html]
> > > Saving to: 'index.html'
> > >
> > > index.html          100%[==================>]  13.01K --.-
> > > KB/s    in
> > > 0s
> > >
> > > 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved
> > > [13322/13322]
> > >
> > > Can you explain me - for what static index.html has this:
> > >
> > > Cache-Control: no-cache, no-store
> > > Pragma: no-cache
> > >
> > > ?
> > >
> > > What can be broken to ignore CC in this page?
> >
> > Hi Yuri,
> >
> >
> > Why do you think the page returned for URL
> > [https://www.microsot.cpom/r
> > u-kz/] is static and not dynamically generated one?
>
> And for me, what's the difference? Does it change anything? In
> addition, 
> it is easy to see on the page and even the eyes - strangely enough -
> to 
> open its code. And? What do you see there?

I see an official home page of Microsoft company for KZ region. The
page is full of javascripts and products offer. It makes sense to
expect that the page could be changed intensively enough.


> > The index.html file is default file name for wget.
>
> And also the name of the default home page in the web. Imagine - I
> know 
> the obvious things. But the question was about something else.
> >
> > man wget:
> >    --default-page=name
> >         Use name as the default file name when it isn't known
> > (i.e., for
> >         URLs that end in a slash), instead of index.html.
> >
> > In fact the https://www.microsoft.com/ru-kz/index.html is a stub
> > page
> > (The page you requested cannot be found.).
>
> You living in wrong region. This is geo-dependent page, as obvious,
> yes?

What I mean is the pages https://www.microsoft.com/ru-kz/ and https://w
ww.microsoft.com/ru-kz/index.html are not same. You can easily confirm
it.


> Again. What is the difference? I open it from different
> workstations, 
> from different browsers - I see the same thing. The code is
> identical. I 
> can is to cache? Yes or no?

I'm a new member of Squid community (about 1 year). While tracking for
community activity I found that you can't grasp the advantages of
HTTP/1.1 over HTTP/1.0 for caching systems. Especially, its ability to
_safely_ cache and serve same amount (but I believe even more) of the
objects as HTTP/1.0 compliant caches do (while not breaking internet).
The main tool of HTTP/1.1 compliant proxies is _revalidation_ process.
HTTP/1.1 compliant caches like Squid tend to cache all possible objects
but later use revalidation for dubious requests. In fact the
revalidation is not costly process, especially using conditional GET
requests.

I found that most of your complains in the mail list and Bugzilla are
related to HTTPS scheme. FYI: The primary tool (revalidation) does not
work for HTTPS scheme using all current Squid branches at the moment.
See bug 4648.

Try to apply the proposed patch and update all related bug reports.

HTH


Garri
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Not all html objects are being cached

joseph
hi its not about https scheme its about evrything
i decide not to involve with arg...
but why not its the last one i should say it once
they ar right most of the ppl admin have no knwoleg so its ok to baby sit them as its
but
--enable-http-violations should be fully ignore cache control and in refresh pattern  admin shuld control the behavior of his need else they should  take of  —enable-http-violations or alow us to do so
controlling the
Pragma: no-cache     and  Cache-Control: no-cache + + ++ +
in both request reply    
and its up to us to fix broke site   since almost 80% or more from the web admin programmer using them just to prevent caching not becaus it brake the page
has nothing to do with old damen page that we can fix the obj to be fresh
soon all web programmer will use those control and squid will become suks end up having cache server not being able to cache all lool soooooooo
let other admin use squid without --enable-http-violations  if they ar worry about braking web shit bad site
and let other good admin that know wat they ar doing control wat they need using --enable-http-violations fully open no restriction at all
https is  rarely used not everywhere can use depend on country
bye
joseph
so as my structure i have http only and as its squid  save me only 5% from all the http bandwith
************************** ***** Crash to the future **** **************************
12