Caching mirrored origin server

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Caching mirrored origin server

jimc
I'm using squid-4.4-2.1.x86_64 from OpenSuSE Tumbleweed.  My goal is
that when doing periodic software updates. each host in my department
will contact my proxy to obtain the new metadata and packages (SuSE has
a syntax for this); the proxy will download each file only once.  This
sounds like pretty standard Squid operation, but there's a gross botfly
in the ointment: the origin servers return 302 Found, each time
redirecting to a different mirror, and with "normal" configuration this
result is passed back to the client which makes a new connection (via
the proxy) to that mirror, but the retrieved file will likely never be
accessed again from that mirror.

Is there anyone who really knows what Squid can do and who has an idea
how to get Squid to follow the redirects on a cache miss, and to respond
to the original URL with the cached copy, whichever mirror it came from?

https://ma.ttwagner.com/lazy-distro-mirrors-with-squid/ by Matt Wagner
(2014-04-03) has some suggestions along this line.  He fixates on one
mirror per origin server, and uses the virtual host feature to
effectively do an internal redirect to that mirror.  I also tried that
solution for a while, but mirrors come and go, and curating the mirror
selection turned out to be a reliability problem, so I gave it up.

Software package RPMs are often large, and in my application it's
important to raise the maximum_object_size; the largest item I've seen
is 158Mb and I've set the limit to 300Mb.  And of course enough disc
cache space has to be allowed to hold the expected set of RPMs to be
used, e.g. when installing the OS on several new machines.  My
cache_dir has 5000 Mb.

--
James F. Carter   Email: [hidden email]
Web: http://www.math.ucla.edu/~jimc (q.v. for PGP key)


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Caching mirrored origin server

Alex Rousskov
On 1/2/19 3:01 PM, jimc wrote:

> I'm using squid-4.4-2.1.x86_64 from OpenSuSE Tumbleweed.  My goal is
> that when doing periodic software updates. each host in my department
> will contact my proxy to obtain the new metadata and packages (SuSE has
> a syntax for this); the proxy will download each file only once.  This
> sounds like pretty standard Squid operation, but there's a gross botfly
> in the ointment: the origin servers return 302 Found, each time
> redirecting to a different mirror, and with "normal" configuration this
> result is passed back to the client which makes a new connection (via
> the proxy) to that mirror, but the retrieved file will likely never be
> accessed again from that mirror.

The default solution for mapping many URLs to a single cache hit is the
store_id helper. That solution is only applicable to URLs that produce
the same content regardless of the URL from the set.

  * https://wiki.squid-cache.org/Features/StoreID
  * http://www.squid-cache.org/Doc/config/store_id_program/


> mirrors come and go, and curating the mirror
> selection turned out to be a reliability problem, so I gave it up.

If there is a common pattern to all mirrors for a given URL, then
store_id can help. You would still be responsible for curating the URL
mapping/patterns, of course, but it may be (more) manageable.

There is no built-in "follow redirect but cache under the original URL"
feature in Squid, probably because such a feature would result in
serving wrong responses in many typical cases. With store_id, the
decision to map URLs and the headaches/risks of doing so are all on your
side.

With some Squid development work, the missing feature can be implemented
on top of the existing adaptation interfaces and the core store_id
functionality, but nobody has done that so far IIRC.


HTH,

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Caching mirrored origin server

jimc
On 2019-01-03 08:34, Alex Rousskov wrote:
> The default solution for mapping many URLs to a single cache hit is the
> store_id helper. That solution is only applicable to URLs that produce
> the same content regardless of the URL from the set.
>
>   * https://wiki.squid-cache.org/Features/StoreID
>   * http://www.squid-cache.org/Doc/config/store_id_program/

@Alex, thanks for the pointer to the StoreID feature.  I'll try to adapt
one of the sample programs in the docs.  The database of distro mirror
patterns mentioned there will be very helpful.

--
James F. Carter   Email: [hidden email]
Web: http://www.math.ucla.edu/~jimc (q.v. for PGP key)
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Caching mirrored origin server

Eliezer Croitoru
The DB of distro mirrors on the wiki is not up-to-date but it's a nice example.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]


-----Original Message-----
From: squid-users <[hidden email]> On Behalf Of jimc
Sent: Thursday, January 3, 2019 21:40
To: [hidden email]
Subject: Re: [squid-users] Caching mirrored origin server

On 2019-01-03 08:34, Alex Rousskov wrote:
> The default solution for mapping many URLs to a single cache hit is the
> store_id helper. That solution is only applicable to URLs that produce
> the same content regardless of the URL from the set.
>
>   * https://wiki.squid-cache.org/Features/StoreID
>   * http://www.squid-cache.org/Doc/config/store_id_program/

@Alex, thanks for the pointer to the StoreID feature.  I'll try to adapt
one of the sample programs in the docs.  The database of distro mirror
patterns mentioned there will be very helpful.

--
James F. Carter   Email: [hidden email]
Web: http://www.math.ucla.edu/~jimc (q.v. for PGP key)
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users