Windows Updates a Caching Stub zone, A windows updates store.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
Windows Updates a Caching Stub zone
<http://www1.ngtech.co.il/wpe/?page_id=301>

I have been working for quite some time trying to see if it is possible to
cache windows updates using Squid.
I have seen it is possible but to test a concept I wrote a small proxy and a
helper tool.
The tools are a Proof Of Concept and an almost full implementation of the
idea.
I consider it a Squid Helper tool.

Feel free to use the tool and if you need any help using it just contact me
here or off list.

Eliezer

----
Eliezer Croitoru <http://ngtech.co.il/lmgtfy/>
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]
 


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

winmail.dat (84K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Hi,

Great idea . I was looking for something like this for years and i was too lazy to start it myself ;)

I am going to test your code in a multi thousand client ISP .

It would more great if use the experiences of http://www.wsusoffline.net/ specially for your fetcher . It is GPL

Also the ip address 13.107.4.50 is mainly used by microsoft for its download services . With services like https://www.virustotal.com/en-gb/ip-address/13.107.4.50/information/ we have found that other domains also used for update/download services . Maybe not bad if create special things for this ip address .

Thanks in advance
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
Hey Omid,

The key concept is that it is possible but not always worth the effort..
I have tested it to work for Windows 10 and for couple other platforms but I didn't verified how it will react to every version of Windows 7.
I have tested how things works with WSUSOFFLINE and you will need to change the regex dstdomain into:
acl wu dstdom_regex download\.windowsupdate\.com$ download\.microsoft\.com$

Now you need to have my latest updated version in order to avoid caching of MS AV updates which are critical and should never be cached for more then 1 hour.

You can try to "seed" the cache using a client which will run WSUSOFFLINE but to my understanding it's not required since you will store more then you actually need.
If one user is downloading an ancient or special update you don't need it stored unless you can predict it will be used\downloaded a lot.

Let me know if you need some help with it.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Omid Kosari
Sent: Thursday, July 14, 2016 2:59 PM
To: [hidden email]
Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Hi,

Great idea . I was looking for something like this for years and i was too
lazy to start it myself ;)

I am going to test your code in a multi thousand client ISP .

It would more great if use the experiences of http://www.wsusoffline.net/
specially for your fetcher . It is GPL

Also the ip address 13.107.4.50 is mainly used by microsoft for its download
services . With services like
https://www.virustotal.com/en-gb/ip-address/13.107.4.50/information/ we have
found that other domains also used for update/download services . Maybe not
bad if create special things for this ip address .

Thanks in advance



--
View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching-Stub-zone-A-windows-updates-store-tp4678454p4678492.html
Sent from the Squid - Users mailing list archive at Nabble.com.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Hi,

Questions
1-What happens if disk or partition becomes full ?
2-Is there a way to use more than one location for store ?
3-Currently hits from your code , could not be counted .How i can use qos flows/tos mark those hits ?
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
Hey Omid,

1. You should understand what you are doing and not blindly fetch downloads.
The estimation is that you will need maximum of 100GB of storage for the whole "store" for a period of time.
This is also due to this that Microsoft Windows Update service will not download files without a need.
The fetcher should help you to download periodical updates but I assume that the updates have a limit... You should consider asking MS on what is expected to be in the downloads or when do download happen.

2. If you need more then one location you should use some logical volume to do that instead of spreading manually over more then one disk.
This is based on the basic understanding that the service is a "web-service" which is serving files and you should treat it the same way like any other.
When I am running a web-service and I need more then one disk I do not run to "spread" it manually but use some OS level tools.
I do trust the OS and the logical volume management tools to do their work properly. When I will loss my trust in them I will stop using this OS, this is as simple as that.
3. The HITS are counted but I need to dig into the code to verify how a HIT is logged and how it can be counted manually.
QOS or TOS, by what? How?
The service how one way out and one way in..
If the requested file is in store you will not see outgoing traffic for the file.
The right way to show a HIT in this service is to change the response headers file to have another header.
This could be done manually using a tiny script but not as a part of the store software.
An example to such addition would be:
# perl -pi -e '$/=""; s/\r\n\r\n/\r\nX-Store-Hit: HIT\r\n\r\n/; /var/storedata/header/v1/fff8db4723842074ab8d8cc4ad20a0f97d47f6d849149c81c4e52abc727d43b5

And it will change the response headers and these can be seen in a squid access.log using a log format.
I can think of other ways to report this but a question:
If it works as expected and expected to always work, why would you want to see the HIT in a QOS or TOS?
QOS and TOS levels of socket manipulation will require me to find a way to hack the simple web service and I will probably won’t go this way.
I do know that you will be able to manipulate QOS or TOS in squid if some header exist in the response.

I will might be able to look at the subject if there is a real technical\functional need for that in a long term usage.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Omid Kosari
Sent: Friday, July 15, 2016 8:48 PM
To: [hidden email]
Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Hi,

Questions
1-What happens if disk or partition becomes full ?
2-Is there a way to use more than one location for store ?
3-Currently hits from your code , could not be counted .How i can use qos
flows/tos mark those hits ?



--
View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching-Stub-zone-A-windows-updates-store-tp4678454p4678524.html
Sent from the Squid - Users mailing list archive at Nabble.com.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Dear Eliezer,

Thanks for reply .

1. I am trying to understand but with your description it would be simpler

2. I already use logical volumes . Silly question

3. I don't want just a hit in log . I try to describe my need . Currently we have exclude the cache hits (based on TOS value) from our customers reserved bandwidth . For example you have 150Mbps internet link from our company and we have limitation for you on our QOS routers . But we have excluded cache hits from your 150M and you may have more than that if you are downloading from our cache hits .

qos_flows local-hit=0x90
qos_flows sibling-hit=0x90
qos_flows parent-hit=0x90

But the hits from your code could not be counted . Even you may help me to do that with linux iptables+squid it would be fine .

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Apart from previous email , maybe this is a bug or not but the fetcher does not release open files/sockets .
Its number of open files just grows . currently i have added 'ulimit 65535' at the line 4 of fetch-task.sh to see what happens . before it was killed.
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
In reply to this post by Omid Kosari
RE: [squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Hey Omid,

Indeed my preference is that if you can ask ask and I will try to give you couple more details on the service and the subject.

Windows updates are considered very static since they have Last-Modified and Date headers.

Else then that,  they support to validate and invalidate requests based on these times.

This what makes it so simple to store them.

Now the main issue(which you probably already know) is that clients are requesting partial content from a full static object.

The full file or object is a resource and when you have the full object most web services can serve the partial content.

Technically if the client software uses static ranges when accessing the a resource it would be very simple to "map" a range request into a specific object "ID" but the issue is that there are scenarios which the client ask for multiple ranges in the same request and everything get a bit complicated.

From a cache(squid) point of view when a client runs a "fetch"  operation he also populate the cache.

This is the most "preferred" way of handling cache population since it relies on a need which is somehow considered as required.

Now when you look at it, in many cases it's a bit silly and can be considered in a way simple minded when you are talking about GB's of static content.

When I look at MS updates I see lots of "Cache-Control: public,max-age=172800" in responses and it might be based on the assumption that the object is predicted to be a part of an update "torrent" of about 48 hours.

The real world is far from this header and caches needs to be smarter in order to avoid re-population and re-deletion of existing content.

Now since MS updates would be probably used over and over again by real clients it's sometimes would be good to just store them for a period of time.

For example there aren’t many Windows XP out there under paid support but if clients are still updating then it's right to have them.

Now what I did was simply wrote a simple web-service which is also a forward proxy that is based on another file system rather compared to the standard ones.

You have a storedata directory which can be changed in the command line.

You have a listening port, and you also have some level of debug info.

The store data have three important sub directories:

request/v1/

header/v1/

response/v1/

Since it's a simple web service that relies on a specific file system structure it doesn't have the TOS and QOS features that are in a much lower level services have.

Since you have full control on the web-service and the response headers are reliable you can safely use some kind of Internal response headers and be sure that MS and their CDN network will not use these and will "harm" your statistics.

You will just need to use the concept which was mentioned in the HIT-MISS thread from 2012:

acl mshit rep_header X-Company-Ms-Cache HIT

clientside_tos 0x30 mshit

And you can get wild with the name to verify that it will be 100% unique and will not collide with the CDN headers.

Also you can use another term then HIT for example "INTERNAL-CACHE-SERVER" would probably not be coming from the up-stream CDN.

Or even add a unique ID(#uuidgen) for this service that should never be mimicked.

Since it's a web-service with static header files you will just need to use the perl script which I sent you in the last email to inject these headers into the response files.

If the store-service is only piping the connection to the up-stream services the response will not include your customized embedded headers.

The response headers are at:

/var/storedata/header/v1/*

The bodies are at:

/var/storedata/response/v1/*

Just as a side note: this store service was designed for MS updates only and using it for other services is prohibited in the code level.

In the mean while I will look at the TOS\QOS options if at all.

Eliezer

----

Eliezer Croitoru

Linux System Administrator

Mobile: +972-5-28704261

Email: [hidden email]

-----Original Message-----

From: squid-users [[hidden email]] On Behalf Of Omid Kosari

Sent: Sunday, July 17, 2016 9:34 AM

To: [hidden email]

Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Dear Eliezer,

Thanks for reply .

1. I am trying to understand but with your description it would be simpler

2. I already use logical volumes . Silly question

3. I don't want just a hit in log . I try to describe my need . Currently we

have exclude the cache hits (based on TOS value) from our customers reserved

bandwidth . For example you have 150Mbps internet link from our company and

we have limitation for you on our QOS routers . But we have excluded cache

hits from your 150M and you may have more than that if you are downloading

from our cache hits .

qos_flows local-hit=0x90

qos_flows sibling-hit=0x90

qos_flows parent-hit=0x90

But the hits from your code could not be counted . Even you may help me to

do that with linux iptables+squid it would be fine .

Thanks


--

View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching-Stub-zone-A-windows-updates-store-tp4678454p4678530.html

Sent from the Squid - Users mailing list archive at Nabble.com.

_______________________________________________

squid-users mailing list

[hidden email]

http://lists.squid-cache.org/listinfo/squid-users


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Dear Eliezer,

Unfortunately no success . I will describe what i did maybe i missed something .

run the command
perl -pi -e '$/=""; s/\r\n\r\n/\r\nX-SHMSCDN: HIT\r\n\r\n/;'  /cache1/header/v1/*

and verified that the text injected correctly

squid config

acl mshit rep_header X-SHMSCDN HIT
clientside_tos 0x30 mshit

but got the following popular log
2016/07/18 16:26:31.927 kid1| WARNING: mshit ACL is used in context without an HTTP response. Assuming mismatch.
2016/07/18 16:26:31.927 kid1| 28,3| Acl.cc(158) matches: checked: mshit = 0


One more thing . as i am not so familiar with perl , may i ask you to please edit it to ignore the files which already have the text ?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
About the mismatch log output I cannot say a thing since I have not researched it.
And about an option to add a HIT HEADER you can use the next script:
https://gist.github.com/elico/ac58073812b8cad14ef154d8730e22cb

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Omid Kosari
Sent: Monday, July 18, 2016 2:39 PM
To: [hidden email]
Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Dear Eliezer,

Unfortunately no success . I will describe what i did maybe i missed something .

run the command
perl -pi -e '$/=""; s/\r\n\r\n/\r\nX-SHMSCDN: HIT\r\n\r\n/;'
/cache1/header/v1/*

and verified that the text injected correctly

squid config

acl mshit rep_header X-SHMSCDN HIT
clientside_tos 0x30 mshit

but got the following popular log
2016/07/18 16:26:31.927 kid1| WARNING: mshit ACL is used in context without an HTTP response. Assuming mismatch.
2016/07/18 16:26:31.927 kid1| 28,3| Acl.cc(158) matches: checked: mshit = 0


One more thing . as i am not so familiar with perl , may i ask you to please edit it to ignore the files which already have the text ?

Thanks




--
View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching-Stub-zone-A-windows-updates-store-tp4678454p4678557.html
Sent from the Squid - Users mailing list archive at Nabble.com.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Alex Rousskov
In reply to this post by Omid Kosari
On 07/18/2016 05:39 AM, Omid Kosari wrote:

> acl mshit rep_header X-SHMSCDN HIT
> clientside_tos 0x30 mshit

You cannot use response-based ACLs like rep_header with clientside_tos.
That directive is currently evaluated only at request processing time,
before there is a response.

> 2016/07/18 16:26:31.927 kid1| WARNING: mshit ACL is used in context without
> an HTTP response. Assuming mismatch.

... which is what Squid is trying to tell you.


HTH,

Alex.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Alex Rousskov wrote
On 07/18/2016 05:39 AM, Omid Kosari wrote:

> acl mshit rep_header X-SHMSCDN HIT
> clientside_tos 0x30 mshit

You cannot use response-based ACLs like rep_header with clientside_tos.
That directive is currently evaluated only at request processing time,
before there is a response.

> 2016/07/18 16:26:31.927 kid1| WARNING: mshit ACL is used in context without
> an HTTP response. Assuming mismatch.

... which is what Squid is trying to tell you.


HTH,

Alex.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Apart from that , can you confirm that we may use cutom header in rep_header ?
Also the problem is acl mshit does not count att all .
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
In reply to this post by Alex Rousskov
Also i have seen that another guy did successfully something like that (not exactly ) in this thread http://squid-web-proxy-cache.1019090.n4.nabble.com/cache-peer-hit-miss-and-reject-td4661928.html
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Amos Jeffries
Administrator
On 19/07/2016 5:50 p.m., Omid Kosari wrote:
> Also i have seen that another guy did successfully something like that (not
> exactly ) in this thread
> http://squid-web-proxy-cache.1019090.n4.nabble.com/cache-peer-hit-miss-and-reject-td4661928.html
>

No. Niki's ACLs had nothing to do with HTTP reply state.

The external ACL helper stored and used *request URL* information. The
first request for any URL was a fixed value, only the second or later
values varied based on the externally stored knowledge. It never
predicts anything variable in advance, and never used reply headers.

Amos

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
In reply to this post by Eliezer Croitoru
Eliezer Croitoru-2 wrote
Hey Omid,

Indeed my preference is that if you can ask ask and I will try to give you
couple more details on the service and the subject.
Hey Eliezer,

1.I have refresh patterns from days before your code . Currently i prefer not to store windows updates in squid internal storage because of deduplication . Now what should i do ? delete this refresh pattern ? or even create a pattern not to cache windows updates ?

refresh_pattern -i (microsoft|windowsupdate)\.com/.*?\.(cab|exe|dll|ms[iuf]|asf|wm[va]|dat|zip|iso|psf)$ 10080 100% 172800 ignore-no-store ignore-reload ignore-private ignore-must-revalidate override-expire override-lastmod

2.Is the position of your squid config important to prevent logical conflicts? for example should it be before above refresh patterns to prevent deduplication ?

acl wu dstdom_regex \.download\.windowsupdate\.com$
acl wu-rejects dstdom_regex stats
acl GET method GET
cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query no-netdb-exchange name=ms1
cache_peer_access ms1 allow GET wu !wu-rejects
cache_peer_access ms1 deny all
never_direct allow GET wu !wu-rejects
never_direct deny all

3.Is it good idea to change your squid config as bellow to have more hits? Or maybe it is big mistake !

acl msip dst 13.107.4.50
acl wu dstdom_regex \.download\.windowsupdate\.com$ \.download\.microsoft\.com$
acl wu-rejects dstdom_regex stats
acl GET method GET
cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query no-netdb-exchange name=ms1
cache_peer_access ms1 allow GET wu !wu-rejects
cache_peer_access ms1 allow GET msip !wu-rejects
cache_peer_access ms1 deny all
never_direct allow GET wu !wu-rejects
never_direct allow GET msip !wu-rejects
never_direct deny all

4.Current storage capacity is 500G andmore than 50% of it becomes full and growing fast . Is there any mechanism for garbage collection in your code ? If not is it good idea to remove files based on last access time (ls -ltu /cache1/body/v1/) ? should i also delete old files from header and request folders ?
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Amos Jeffries
Administrator
On 19/07/2016 10:58 p.m., Omid Kosari wrote:

> Eliezer Croitoru-2 wrote
>> Hey Omid,
>>
>> Indeed my preference is that if you can ask ask and I will try to give you
>> couple more details on the service and the subject.
>
> Hey Eliezer,
>
> 1.I have refresh patterns from days before your code . Currently i prefer
> not to store windows updates in squid internal storage because of
> deduplication . Now what should i do ? delete this refresh pattern ? or even
> create a pattern not to cache windows updates ?
>
> refresh_pattern -i
> (microsoft|windowsupdate)\.com/.*?\.(cab|exe|dll|ms[iuf]|asf|wm[va]|dat|zip|iso|psf)$
> 10080 100% 172800 ignore-no-store ignore-reload ignore-private
> ignore-must-revalidate override-expire override-lastmod
>

Either;
  cache deny ...

Or (if your Squid supports it)

  store_miss deny ...


The cache ACLs are again request-only ones. So based on dstdomain of WU
services.

The store_miss ACLs can be based on request or reply. So nice things
like reply Content-Type header etc. can be used.


If your refresh_pattern causes something to be a HIT in cache, then the
store_miss stuff will never happen of course.

Likewise, if the store_miss prevents something being added to cache the
refresh_pattern will not then be able to have any effect on its cache entry.



> 2.Is the position of your squid config important to prevent logical
> conflicts? for example should it be before above refresh patterns to prevent
> deduplication ?
>
> acl wu dstdom_regex \.download\.windowsupdate\.com$
> acl wu-rejects dstdom_regex stats
> acl GET method GET
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query
> no-netdb-exchange name=ms1
> cache_peer_access ms1 allow GET wu !wu-rejects
> cache_peer_access ms1 deny all
> never_direct allow GET wu !wu-rejects
> never_direct deny all


For these directives ordering is relevant only with regards to other
lines of the same directive name.

The exception being cache_peer_access; where the peer name field defines
which lines are a sequential group. And the cache_peer definition line
must come first.


>
> 3.Is it good idea to change your squid config as bellow to have more hits?
> Or maybe it is big mistake !
>
> acl msip dst 13.107.4.50
> acl wu dstdom_regex \.download\.windowsupdate\.com$
> \.download\.microsoft\.com$
> acl wu-rejects dstdom_regex stats
> acl GET method GET
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query
> no-netdb-exchange name=ms1
> cache_peer_access ms1 allow GET wu !wu-rejects
> cache_peer_access ms1 allow GET msip !wu-rejects
> cache_peer_access ms1 deny all
> never_direct allow GET wu !wu-rejects
> never_direct allow GET msip !wu-rejects
> never_direct deny all


Your question here is not clear. None of this config is directly related
to HITs. With Eliezers setup HITs are a intentional by-product of the
manipulatinoon happening in the peer.
So you either use the peer and get what HITs it causes, or you don't.

>
> 4.Current storage capacity is 500G andmore than 50% of it becomes full and
> growing fast . Is there any mechanism for garbage collection in your code ?
> If not is it good idea to remove files based on last access time (ls -ltu
> /cache1/body/v1/) ? should i also delete old files from header and request
> folders ?
>

I'll leave that to Eliezer to answer.

Amos

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
In reply to this post by Omid Kosari
Hey Omid,

I will try to answer about the subject in general and it should contain the
answers to what you have asked.

Windows Updates can somehow be cached when combining squid StoreID and a
refresh_pattern.
However the nature of squid is to be a "cache", and in many cases since we
can predict that we will need specific content more often it would be
preferred to store the objects.
The "tradeoff" is using the wire and the clients to fetch over and over
again the same exact content while assuring consistency and integrity of it.
For example most of Windows updates can be publically cached for 48 hours
which should be enough for a "torrent"(some would call it DOS) of updates.

A refresh_pattern which have the next options: ignore-no-store
ignore-reload ignore-private ignore-must-revalidate override-expire
override-lastmod
Will cause in a way in a reduction of some bandwidth consumption by the
clients but it kind of "breaks" some other features of the cache.
Since squid 3.X there was a software change inside squid to prefer
Integrity over caching due to changes in the nature of the Internet.

MS are cache friendly in general and you will probably won't need
override-lastmod and couple other options in the refresh_pattern definition.
A refresh_pattern location in the squid.conf should not cause any
difference on caching but it is important to place them like many FW and ACL
rules: first seen and match HIT.
This is since squid.conf parser places the refresh_patterns to be validated
one at a time and each at a time from top to bottom inside the squid.conf
file.

To prevent de-duplication of content as Amos advised you should use the
"cache" config directive.
Take a peek at the docs: http://www.squid-cache.org/Doc/config/cache/
And also the example at:
http://wiki.squid-cache.org/SquidFaq/SquidAcl#how_do_I_configure_Squid_not_t
o_cache_a_specific_server.3F
And remember to add after a cache deny, "cache allow all".

About the "msip" acl you have added:
It's not really needed and can also cause strange things if some request
for\to another domain would be sent to this cache_peer.
This is due to this service nature to return a 500 error on requests for
non windows update domains.

If you notice weird behavior with this store service like space consumption
there are couple steps you should take:
- stop the crontab of the fetcher(to rebase the situation)
- verify that currently there are no stored responses which was supposed to
be "private" ie use this script:
https://gist.github.com/elico/5ae8920a4fbc813b415f8304cf1786db
- verify how many unique requests are stored in the ie "ls
/cache1/request/v1/|wc -l"

The next step would be to examine the requests dump ie "tar cvfJ
requests.tar.xz /cache1/request/v1/" and send me these dumps for analysis.
If you need to filter the requests before sending them to me you will need
to verify if there are cookies in the requests files.

I believe that when we will have the numbers of responses that has private
or public cache-control headers it would be much simpler as a start point
before the next step.

Just to mention that a Garbage Collection operation should be done before
the actual full fetch.
In my experiments I couldn't find evidence of a situation like you have but
I assumed that some networks will have issues in this level.
I will enhance my fetcher to avoid private content fetching but to be sure
on the right move I need both the statistics and the requests dump.

My code is not a cache which manage some level of the expiration or
validation of the content.
It's a simple http web service which was embedded with a special File
System structure and a forward proxy.

I will be available tonight at my skype: elico2013
Also on the squid irc channel at irc.freenode.net with the nick: elico
And of-course my email.
Just contact me so we would be able to understand the situation better.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users [mailto:[hidden email]] On
Behalf Of Omid Kosari
Sent: Tuesday, July 19, 2016 1:59 PM
To: [hidden email]
Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows
updates store.

Eliezer Croitoru-2 wrote
> Hey Omid,
>
> Indeed my preference is that if you can ask ask and I will try to give
you
> couple more details on the service and the subject.

Hey Eliezer,

1.I have refresh patterns from days before your code .
Currently i prefer not to store windows updates in squid internal storage
because of de-duplication .
Now what should i do ? delete this refresh pattern ? or even
create a pattern not to cache windows updates ?

refresh_pattern -I
(microsoft|windowsupdate)\.com/.*?\.(cab|exe|dll|ms[iuf]|asf|wm[va]|dat|zip|
iso|psf)$ 10080 100% 172800 ignore-no-store ignore-reload ignore-private
ignore-must-revalidate override-expire override-lastmod

2.Is the position of your squid config important to prevent logical
conflicts?
for example should it be before above refresh patterns to prevent
de-duplication ?

acl wu dstdom_regex \.download\.windowsupdate\.com$
acl wu-rejects dstdom_regex stats
acl GET method GET
cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query
no-netdb-exchange name=ms1
cache_peer_access ms1 allow GET wu !wu-rejects
cache_peer_access ms1 deny all
never_direct allow GET wu !wu-rejects
never_direct deny all

3.Is it good idea to change your squid config as bellow to have more hits?
Or maybe it is big mistake !

acl msip dst 13.107.4.50
acl wu dstdom_regex \.download\.windowsupdate\.com$
\.download\.microsoft\.com$
acl wu-rejects dstdom_regex stats
acl GET method GET
cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query
no-netdb-exchange name=ms1
cache_peer_access ms1 allow GET wu !wu-rejects
cache_peer_access ms1 allow GET msip !wu-rejects
cache_peer_access ms1 deny all
never_direct allow GET wu !wu-rejects
never_direct allow GET msip !wu-rejects
never_direct deny all

4.Current storage capacity is 500G andmore than 50% of it becomes full and
growing fast .
Is there any mechanism for garbage collection in your code ?
If not is it good idea to remove files based on last access time (ls -ltu
/cache1/body/v1/) ?
should i also delete old files from header and request folders ?




--
View this message in context:
http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching
-Stub-zone-A-windows-updates-store-tp4678454p4678581.html
Sent from the Squid - Users mailing list archive at Nabble.com.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

winmail.dat (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru
In reply to this post by Omid Kosari
Hey Omid,

After inspection of more data I have seen that there are couple cases which will result in disks space consumption.
Windows Updates supports a variety of languages. When you have more then one or two languages the amount of cache is rapidly changes.
To give some numbers to the picture:
- Each Windows version have multiple versions(starter, home, professional, enterprise..)
- Each cpu arch requires it's own updates(x86, x64)
- Each Windows version can have a big update for multiple languages, depends on the locality of the system
- Each Windows product such as office has it's own language packs and updates(some updates are huge..)

Since I am not one of Microsoft Engineers or product\updates managers I cannot guarantee that my understanding of the subject is solid like the ground.
But in the other hand since I do have background with HTTP and it's structure I can guarantee some assurance that my research can be understood by most if not any HTTP expert.

Squid by it's nature honors specific caching rules and these are very general.
To my understanding Squid was not built to satisfy each use case but it helps many of them.
Since you also noticed that windows updates can consume lots of disk space then what you mentioned about last accessed time seems pretty reasonable for a cache.
You have the choice on how to manage your store\cache according to whatever is required\needed.
For example the command:
find /cache1/body/v1/  -atime +7 -type f|wc -l

Should give you some details about the files which was not accessed in the last week.
We can try to enhance the above command\idea to calculate statistics in a way that will help us to get an idea of what files or updates are downloaded periodically.
Currently only with the existence of the request files we can understand what responses belongs to what request.

Let me know if you want me to compose some script that will help you to decide what files to purge. (I will probably write it in ruby)
There is an option to "blacklist" a response from being fetched by the fetcher or to be used by the web-service but you will need to update to the latest version of the fetcher and to use the right cli option(don't remember now) or to use the command under a "true" pipe such as "true | /location/fetcher ..." to avoid any "pause" which it will cause.

Thanks,
Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Omid Kosari
Sent: Tuesday, July 19, 2016 1:59 PM
To: [hidden email]
Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru-2 wrote
> Hey Omid,
>
> Indeed my preference is that if you can ask ask and I will try to give you
> couple more details on the service and the subject.

Hey Eliezer,
<SNIP>

4.Current storage capacity is 500G andmore than 50% of it becomes full and
growing fast . Is there any mechanism for garbage collection in your code ?
If not is it good idea to remove files based on last access time (ls -ltu
/cache1/body/v1/) ? should i also delete old files from header and request
folders ?




--
View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching-Stub-zone-A-windows-updates-store-tp4678454p4678581.html
Sent from the Squid - Users mailing list archive at Nabble.com.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Windows Updates a Caching Stub zone, A windows updates store.

Omid Kosari
Hi,

Thanks for support .

recently i have seen a problem with version beta 0.2 . when fetcher is working the kernel logs lots of following error
TCP: out of memory -- consider tuning tcp_mem

I think the problem is about orphaned connections which i mentioned before . Managed to try new version to see what happens.

Also i have a feature request . Please provide a configuration file for example in /etc/foldername or even beside the binary files to have selective options for both fetcher and logger .

I have seen following change log
beta 0.3 - 19/07/2016
+ Upgraded the fetcher to honour private and no-store cache-control headers when fetching objects.

As my point of view the more hits is better and there is no problem to store private and no-store objects if it helps to achieve more hits and bandwidth saving . So it would be fine to have an option in mentioned config file to change it myself .

Thanks again
12