Quantcast

Squid vs caching products like memcached

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Squid vs caching products like memcached

lightbulb432
What’s the difference between the reverse proxying features of Squid and a caching product like memcached?

I don’t necessarily mean specific comparisons of both products (e.g. performance), but rather explanations of what both types of products do. I understand that there are some large-scale websites out there that make use of both, so clearly they are better at different things and both have a place in a given architecture.

As a newbie, however, I’m unable to determine how they both come together and fit into an architecture to make a website more scalable, so your help would be really appreciated.

Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Squid vs caching products like memcached

Jose Celestino
Words by lightbulb432 [Wed, May 16, 2007 at 08:16:44AM -0700]:
>
> What’s the difference between the reverse proxying features of Squid and a
> caching product like memcached?
>

Memcache has nothing to do with proxying. Squid talks http and caches
http objects, memcache talks the memcache protocol and caches objects
(can be the key/value you want). memcache is not related (directily at
least) with http, it is just a cache engine, you have to program around
it to turn it into something useful.

>
> I don’t necessarily mean specific comparisons of both products (e.g.
> performance), but rather explanations of what both types of products do. I
> understand that there are some large-scale websites out there that make use
> of both, so clearly they are better at different things and both have a
> place in a given architecture.
>

Yes. At first Squid is something you put between the cliente and the web
server. Memcache is something you put between your web servers and your
database/filesystem/whatever, it stays on the backend.

--
Jose Celestino
----------------------------------------------------------------
http://www.msversus.org/     ; http://techp.org/petition/show/1
http://www.vinc17.org/noswpat.en.html
----------------------------------------------------------------
"And on the trillionth day, Man created Gods." -- Thomas D. Pate
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Squid vs caching products like memcached

Sean Walberg
In reply to this post by lightbulb432
On 5/16/07, lightbulb432 <[hidden email]> wrote:

> What's the difference between the reverse proxying features of Squid and a
> caching product like memcached?

Memcached is a distributed, in memory, hash table with a network
interface.  Most often you stuff the results of expensive queries
(database, computations, XML processing) into memcached so that
multiple nodes can get the data without having to do the expensive
query.

Squid as a reverse proxy caches http objects -- pages, css,
javascript, images, etc.  You use squid to offload entire requests to
your web server.

As an example at b5media we front our web farm with Squid, but only
cache images, javascript, and CSS.  WordPress and Squid don't play
well together because WordPress doesn't send out proper headers for
Squid to use, so we don't cache pages.

Beside taking hits off the web server, Squid is also good at spoon
feeding slow clients.  Previously a slow client keeps an expensive
Apache slot tied up, now Squid takes that data  from Apache and feeds
the client -- squid is more efficient at this task than Apache.

On the WordPress backend we store a lot of internal stuff in
memcached.  We have some internal code that uses a REST API that
figures out what blog goes in which channel.  Rather than make a
handful of REST calls on every page view, which incurs latency for the
web hit and CPU for the XML processing, we check memcached to see if
the PHP object exists.  If we get a memcached cache hit we've just
saved ourselves a lot of time.  If we get a miss, we make the API
calls and stuff the PHP object back into memcached for the next
person.

I look at the caching within a LAMP application as a multilevel thing.
 We cache the http objects we can with squid.  If we have to generate
a page we cache what we can in memcached, just like we cache compiled
PHP scripts in an opcode cache.  If we have to hit the database we use
MySQL query caching at that layer.

It's not a one-or-the-other type of thing, these are two tools that
are clearly the best at what they do, and can (should?) be used
together as part of a good architecture.

Sean

--
Sean Walberg <[hidden email]>    http://ertw.com/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Squid vs caching products like memcached

lightbulb432
Great answer, thanks!

How does Squid's page caching ability work in terms of caching pages (as though they are static) but that were generated dynamically?

For example, Amazon.com's homepage is dynamic but not generated dynamically on each request for that page; rather, I assume they set it to be cached anytime a request for that page comes in, with some sort of expiration policy (e.g. only dynamically generate the homepage once an hour, then serve that cached static page for the rest of that hour).

I really hope Squid makes such a configuration possible and easy.

Thanks.


Sean A. Walberg wrote
On 5/16/07, lightbulb432 <veerukrishnan@hotmail.com> wrote:

> What's the difference between the reverse proxying features of Squid and a
> caching product like memcached?

Memcached is a distributed, in memory, hash table with a network
interface.  Most often you stuff the results of expensive queries
(database, computations, XML processing) into memcached so that
multiple nodes can get the data without having to do the expensive
query.

Squid as a reverse proxy caches http objects -- pages, css,
javascript, images, etc.  You use squid to offload entire requests to
your web server.

As an example at b5media we front our web farm with Squid, but only
cache images, javascript, and CSS.  WordPress and Squid don't play
well together because WordPress doesn't send out proper headers for
Squid to use, so we don't cache pages.

Beside taking hits off the web server, Squid is also good at spoon
feeding slow clients.  Previously a slow client keeps an expensive
Apache slot tied up, now Squid takes that data  from Apache and feeds
the client -- squid is more efficient at this task than Apache.

On the WordPress backend we store a lot of internal stuff in
memcached.  We have some internal code that uses a REST API that
figures out what blog goes in which channel.  Rather than make a
handful of REST calls on every page view, which incurs latency for the
web hit and CPU for the XML processing, we check memcached to see if
the PHP object exists.  If we get a memcached cache hit we've just
saved ourselves a lot of time.  If we get a miss, we make the API
calls and stuff the PHP object back into memcached for the next
person.

I look at the caching within a LAMP application as a multilevel thing.
 We cache the http objects we can with squid.  If we have to generate
a page we cache what we can in memcached, just like we cache compiled
PHP scripts in an opcode cache.  If we have to hit the database we use
MySQL query caching at that layer.

It's not a one-or-the-other type of thing, these are two tools that
are clearly the best at what they do, and can (should?) be used
together as part of a good architecture.

Sean

--
Sean Walberg <sean@ertw.com>    http://ertw.com/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: Squid vs caching products like memcached

Jose Celestino
Words by lightbulb432 [Wed, May 16, 2007 at 10:39:29AM -0700]:

>
> Great answer, thanks!
>
> How does Squid's page caching ability work in terms of caching pages (as
> though they are static) but that were generated dynamically?
>
> For example, Amazon.com's homepage is dynamic but not generated dynamically
> on each request for that page; rather, I assume they set it to be cached
> anytime a request for that page comes in, with some sort of expiration
> policy (e.g. only dynamically generate the homepage once an hour, then serve
> that cached static page for the rest of that hour).
>
> I really hope Squid makes such a configuration possible and easy.
>

Yes. That's the basics :)

--
Jose Celestino
----------------------------------------------------------------
http://www.msversus.org/     ; http://techp.org/petition/show/1
http://www.vinc17.org/noswpat.en.html
----------------------------------------------------------------
"And on the trillionth day, Man created Gods." -- Thomas D. Pate
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: Squid vs caching products like memcached

lightbulb432
So are you saying that it is possible and quite basic to do this with Squid?

My understanding is that Squid can cache static objects, but am unaware about whether it can cache entire dynamically generated pages (not just the static content like images and stylesheets contained within those pages), and under custom expiration rules like the one I described in my previous post about Amazon.com.


Jose Celestino wrote
Words by lightbulb432 [Wed, May 16, 2007 at 10:39:29AM -0700]:
>
> Great answer, thanks!
>
> How does Squid's page caching ability work in terms of caching pages (as
> though they are static) but that were generated dynamically?
>
> For example, Amazon.com's homepage is dynamic but not generated dynamically
> on each request for that page; rather, I assume they set it to be cached
> anytime a request for that page comes in, with some sort of expiration
> policy (e.g. only dynamically generate the homepage once an hour, then serve
> that cached static page for the rest of that hour).
>
> I really hope Squid makes such a configuration possible and easy.
>

Yes. That's the basics :)

--
Jose Celestino
----------------------------------------------------------------
http://www.msversus.org/     ; http://techp.org/petition/show/1
http://www.vinc17.org/noswpat.en.html
----------------------------------------------------------------
"And on the trillionth day, Man created Gods." -- Thomas D. Pate
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: Squid vs caching products like memcached

Chris Robertson-2
lightbulb432 wrote:
> So are you saying that it is possible and quite basic to do this with Squid?
>
> My understanding is that Squid can cache static objects, but am unaware
> about whether it can cache entire dynamically generated pages (not just the
> static content like images and stylesheets contained within those pages),
> and under custom expiration rules like the one I described in my previous
> post about Amazon.com.
>  

Read and be enlightened:

http://www.mnot.net/cache_docs/

Chris
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Squid vs caching products like memcached

Adrian Chadd
In reply to this post by lightbulb432
On Wed, May 16, 2007, lightbulb432 wrote:

>
> Great answer, thanks!
>
> How does Squid's page caching ability work in terms of caching pages (as
> though they are static) but that were generated dynamically?
>
> For example, Amazon.com's homepage is dynamic but not generated dynamically
> on each request for that page; rather, I assume they set it to be cached
> anytime a request for that page comes in, with some sort of expiration
> policy (e.g. only dynamically generate the homepage once an hour, then serve
> that cached static page for the rest of that hour).
>
> I really hope Squid makes such a configuration possible and easy.

You'd probably be surprised - sites seem happy to assemble their PHP pages
almost every time, and try to use various constructs to cache the data used
to create the page (RSS, XML, SQL, etc.)

Dynamic content page authors need to assemble some behaviours which
are cache-friendly. Its not impossible, it just requires a little smart
thinking whilst designing stuff.

The Squid homepage at the moment is assembled via PHP, but it:

* Assemble a last-modified header based on the datestamp of the "bits" of the
  page (it takes the most recent time of each of the elements and uses that
  as LM - including the page content, the header/footer, and the generation script.)
* Generates an E-Tag based on the above Last-Modified header
* Handle If-Modified-Since

I'm sure there's more that can be done - I'll be looking into what else can be
done if/when I dynamically pull in RSS for a "news" section, for example -
but you have to keep in mind you're trying to improve user experience.

Most sites seem to concentrate on improving user experience through larger
pipes to the internet and employing CDNs. (There's some "game" to being able
to say you push large amounts of traffic, it seems to pull funding.)
You can also improve user experience by ensuring the relevant bits of your
page are cachable with the right options - and even if that requires
revalidation (a couple RTTs for the TCP setup, then request/reply), you're
still saving on the object transfer time.




Adrian

Loading...