Ideas for better caching these popular urls

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Ideas for better caching these popular urls

Omid Kosari
Hello,

squid-top-domains.JPG
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t93386/squid-top-domains.JPG>  

This image shows stats from one of my squid boxes . I have question about
highlighted ones . I think they should have better hit ratio because they
are popular between clients .
I have checked a lot of things like calamaris and logs , played with
refresh_pattern , storeid rules etc .

I want gurus and community to please help for better HITs .

Also i am ready to share specific parts of access.log and others if
requested .

Thanks



--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for better caching these popular urls

Amos Jeffries
Administrator
On 10/04/18 22:32, Omid Kosari wrote:
> Hello,
>
> squid-top-domains.JPG
> <http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t93386/squid-top-domains.JPG>  
>
> This image shows stats from one of my squid boxes . I have question about
> highlighted ones . I think they should have better hit ratio because they
> are popular between clients .

There are no URLs in that image. There are only wildcards for top-level
domains and a HIT % over the *entire* domain.

To figure out whether any of them should actually have better HIT ratios
you have to look at the actual URLs and see how much uniqueness exists
there.

Then for the _full_ URLs (scheme, domain, path, *and* ?query portions)
which are not very unique look at the response headers to see why they
are not caching well. The tool at redbot.org can help with that last part.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for better caching these popular urls

Omid Kosari
Thanks for reply .

I assumed the community at different scales from little isp to large ISPs
may have common domains like those i highlighted so they may have same issue
as mine . So i ignored common parts .

One of problems with redbot is it shows timeout for big files like

http://gs2.ww.prod.dl.playstation.net/gs2/appkgo/prod/CUSA00900_00/2/f_2df8e321f37e2f5ea3930f6af4e9571144916013ee38893d881890b454b5fed6/f/UP9000-CUSA00900_00-BLOODBORNE000000_4.pkg?downloadId=00000187&du=000000000000018700e2291bda0f868f&country=us&downloadType=ob&q=aa2cd9c8d1f359feb843ae4a6c99cfcdb6569ca9cc60ad6d28b6f8de3b5fac23&threadId=0&serverIpAddr=23.57.69.81&r=00000027

http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA07557_00/25/f_053bab8c9dec6fbc68a0bd9fc58793285ae350ccf7dadacb35b5840228a9d802/f/EP4001-CUSA07557_00-F12017EMASTER000-A0113-V0100_0.pkg?downloadId=00000059&du=000000000000005900e22977e62f91a2&downloadType=ob&product=0183&serverIpAddr=8.248.5.254&r=00000032


I assumed anyone with few thousand of users may have same problem and maybe
they like to share for example their refresh_pattern or storeid to solve my
problem . You better know that playstation is everywhere playstation ;)

Here is part of storeid_db file
^http:\/\/.*\.sonycoment\.loris-e\.llnwd\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1
^http:\/\/.*\.playstation\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1

Almost all of the playstation huge downloads are with 206 code but it will
download the file from start to end , if i remember correctly in this
situation squid will correctly cache the file .



--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for better caching these popular urls

Eliezer Croitoru
Hey Omid,

From what I remember the basics of math to verify the patter of a specific set of numbers have some kind of pattern is to have at-least 3 items.
But in the cryptography world it another story.
I have not researched playstation downloads and will probably won't do that.
Others might offer some help but you must understand what you are trying to predict in these urls and downloads.
From what I have seen it seem that this CDN "llnwd.net" is very cache friendly but you need to know how to handle their traffic.
They don’t use any form of ETAG headers but they do provide some pieces of information in the url's that can identify something about it.
If they use a ticketing system such as couple other CDN providers you would need to know the "ID" of the url before it's being downloaded.
You will need more then just the urls but also the response headers for these.
I might be able to write an ICAP service that will log requests and response headers and it can assist Cache admins to improve their efficiency but this can take a while.

All The Bests,
Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]



-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of Omid Kosari
Sent: Tuesday, April 10, 2018 14:20
To: [hidden email]
Subject: Re: [squid-users] Ideas for better caching these popular urls

Thanks for reply .

I assumed the community at different scales from little isp to large ISPs
may have common domains like those i highlighted so they may have same issue
as mine . So i ignored common parts .

One of problems with redbot is it shows timeout for big files like

http://gs2.ww.prod.dl.playstation.net/gs2/appkgo/prod/CUSA00900_00/2/f_2df8e321f37e2f5ea3930f6af4e9571144916013ee38893d881890b454b5fed6/f/UP9000-CUSA00900_00-BLOODBORNE000000_4.pkg?downloadId=00000187&du=000000000000018700e2291bda0f868f&country=us&downloadType=ob&q=aa2cd9c8d1f359feb843ae4a6c99cfcdb6569ca9cc60ad6d28b6f8de3b5fac23&threadId=0&serverIpAddr=23.57.69.81&r=00000027

http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA07557_00/25/f_053bab8c9dec6fbc68a0bd9fc58793285ae350ccf7dadacb35b5840228a9d802/f/EP4001-CUSA07557_00-F12017EMASTER000-A0113-V0100_0.pkg?downloadId=00000059&du=000000000000005900e22977e62f91a2&downloadType=ob&product=0183&serverIpAddr=8.248.5.254&r=00000032


I assumed anyone with few thousand of users may have same problem and maybe
they like to share for example their refresh_pattern or storeid to solve my
problem . You better know that playstation is everywhere playstation ;)

Here is part of storeid_db file
^http:\/\/.*\.sonycoment\.loris-e\.llnwd\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1
^http:\/\/.*\.playstation\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1

Almost all of the playstation huge downloads are with 206 code but it will
download the file from start to end , if i remember correctly in this
situation squid will correctly cache the file .



--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for better caching these popular urls

Omid Kosari
Eliezer Croitoru wrote
> You will need more then just the urls but also the response headers for
> these.
> I might be able to write an ICAP service that will log requests and
> response headers and it can assist Cache admins to improve their
> efficiency but this can take a while.

Hi Eliezer,

Nice idea. I am ready to test/help/share what you need in real production
environment. Please also do a general thing which includes other domains in
first post attachment. They worth a try .

Thanks




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for better caching these popular urls

Eliezer Croitoru
Hey Omid,

I will try to use a file format similar to this:
## FILENAME = unixtime-sha256
ESPMOD icap://127.0.0.1:1344/dumper ICAP/1.0
date: Wed, 11 Apr 2018 16:52:13 GMT
encapsulated: req-hdr=0, res-hdr=105, res-body=413
preview: 0
allow: 204
host: 127.0.0.1:1344
Socket-Remote-Addr: 127.0.0.1:55178

GET http://ngtech.co.il/index.html HTTP/1.1
Accept: */*
User-Agent: curl/7.29.0

HTTP/1.1 200 OK
Content-Length: 17230
Accept-Ranges: bytes
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Origin: *
Content-Type: text/html
Date: Wed, 11 Apr 2018 16:52:13 GMT
Last-Modified: Tue, 03 Apr 2018 20:19:05 GMT
Server: nginx/1.10.3 (Ubuntu)
Vary: Accept-Encoding
## EOF

I have a prototype that I wrote three years ago but it needs to be polished for general use.
I will update when I will have some progress.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]



-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of Omid Kosari
Sent: Wednesday, April 11, 2018 12:32
To: [hidden email]
Subject: Re: [squid-users] Ideas for better caching these popular urls

Eliezer Croitoru wrote
> You will need more then just the urls but also the response headers for
> these.
> I might be able to write an ICAP service that will log requests and
> response headers and it can assist Cache admins to improve their
> efficiency but this can take a while.

Hi Eliezer,

Nice idea. I am ready to test/help/share what you need in real production
environment. Please also do a general thing which includes other domains in
first post attachment. They worth a try .

Thanks




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for better caching these popular urls

Eliezer Croitoru
In reply to this post by Omid Kosari
Hey Omid,

I found the service I wrote and packed it in a RPM at:
http://ngtech.co.il/repo/centos/7/x86_64/response-dumper-icap-1.0.0-1.el7.centos.x86_64.rpm

If you are using other OS let me know and I will try to package it for your OS.
Currently debian\ubuntu alien converts the RPM smoothly.

The dumps directory is at:
/var/response-dumper

But the cleanup and filtering ACL's are your job.
You can define which GET requests the service dump\log into the files.
Each individual file in this directory will be name in the next format:
<int epoc time>-<8 bytes uuid>-<md5(GET:full url)>

This format will allow multiple requests happen at the same time but have a different name but the URL hash is still the same so you can filter files by this.
To calculate the hash of a URL use:
$ echo -n "GET:http:/url-to-has.com/path?query=terms"|md5sum

In each and every file the full ICAP respmod details exits ie:
ICAP Request\r\n
HTTP Request \r\n
HTTP Response\r\n

By default cookies+authorization headers are censored from both request and response in the dump to avoid some privacy law issues.

Now the only missing feature is RedBot is to feed a single request and a single response to get a full analysis.

Let me know if it works OK for you(works here fine for a while now).

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of Omid Kosari
Sent: Wednesday, April 11, 2018 12:32
To: [hidden email]
Subject: Re: [squid-users] Ideas for better caching these popular urls

Eliezer Croitoru wrote
> You will need more then just the urls but also the response headers for
> these.
> I might be able to write an ICAP service that will log requests and
> response headers and it can assist Cache admins to improve their
> efficiency but this can take a while.

Hi Eliezer,

Nice idea. I am ready to test/help/share what you need in real production
environment. Please also do a general thing which includes other domains in
first post attachment. They worth a try .

Thanks




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users