HDD/RAM Capacity vs store_avg_object_size

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

HDD/RAM Capacity vs store_avg_object_size

bugreporter
Hi,

Can anybody help me to confirm my understanding of the memory usage vs the persistent cache capacity? Below my understanding:

According to http://wiki.squid-cache.org/SquidFaq/SquidMemory:

1- We need 14 MB of memory per 1 GB on disk for 64-bit Squid.The wiki is there since I know squid (ie. i'm very old now). Is this information still valid?

2- Is this assumption based on the default value of 13 KB for store_avg_object_size?

3- If answers to questions above are both YES, can we deduce that we need  182 bytes in memory per object in the persistent cache on 64x system? [182 = (14 * 1024 * 1024) / (1024 * 1024 / store_avg_object_size)]

4- Today the store_avg_object_size should be really greater than 13 KB. The mean object size I can see on my own cache is about 100 KB. Can anybody refer me to a website where I can find fresh information?

5- If I'm completely on a wrong way, can anybody help me to find a formula that can help me to deduce the required RAM for a given HDD capacity (and vice versa).

Warm Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Alex Rousskov
On 07/12/2017 04:31 AM, bugreporter wrote:

> Can anybody help me to confirm my understanding of the memory usage vs the
> persistent cache capacity? Below my understanding:
>
> According to http://wiki.squid-cache.org/SquidFaq/SquidMemory:
>
> 1- We need 14 MB of memory per 1 GB on disk for 64-bit Squid.The wiki is
> there since I know squid (ie. i'm very old now). Is this information still
> valid?
>
> 2- Is this assumption based on the default value of 13 KB for
> *store_avg_object_size*?
>
> 3- If answers to questions above are both YES, can we deduce that we need
> *182* bytes in memory per object in the persistent cache on 64x system?
> [*182* = (14 * 1024 * 1024) / (1024 * 1024 / store_avg_object_size)]
>
> 4- Today the *store_avg_object_size* should be really greater than 13 KB.
> The mean object size I can see on my own cache is about 100 KB. Can anybody
> refer me to a website where I can find fresh information?
>
> 5- If I'm completely on a wrong way, can anybody help me to find a formula
> that can help me to deduce the required RAM for a given HDD capacity (and
> vice versa).

I cannot answer your questions without doing research, but I can supply
the following additional information:

* The amount of RAM used for shared (rock) cache_dirs is usually very
different from the amount of RAM used for SMP-unaware ufs-based
cache_dirs. The wiki page was written before Rock support was added.

* For ufs, you can test any formula/hypothesis by filling a disk cache
(with dummy/test objects) and measuring Squid RAM usage. The RAM usage
growth due to cache_dir index should be linear so it is fairly easy to
measure.

* For rock, you can test any formula/hypothesis by configuring your disk
caches and starting SMP Squid. The shared memory tables are created at
start time so, if you know what you are doing, you can probably see how
big they are without filling the disk cache.


Please update the wiki if you find any documentation bugs.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Amos Jeffries
Administrator
In reply to this post by bugreporter
On 12/07/17 22:31, bugreporter wrote:

> Hi,
>
> Can anybody help me to confirm my understanding of the memory usage vs the
> persistent cache capacity? Below my understanding:
>
> According to http://wiki.squid-cache.org/SquidFaq/SquidMemory:
>
> 1- We need 14 MB of memory per 1 GB on disk for 64-bit Squid.The wiki is
> there since I know squid (ie. i'm very old now). Is this information still
> valid?

Yes. It is a rough estimate based on the size of code objects used to
store each request message - they have not changed in at least the past
10 years. There may be some variance based on extra headers modern HTTP
contains. But that is not a huge amount and the number is a rough
estimate to begin with.



>
> 2- Is this assumption based on the default value of 13 KB for
> *store_avg_object_size*?

No.

That avg object size is for the full object with payload. Those payloads
are stored inside cache_mem or cache_dir, and do not take up index
space. So have a total limit of whatever you configure those storage
areas to be.

Squid uses the above directive for its startup initialization of the
index's hash table. The table can be changed dynamically, but that is
quite expensive in terms of CPU cycles and would delay some requests so
this is a nice shortcut to avoid most pauses.


The 10 or 14 MB is purely for the metadata necessary to index those
cached objects. Which is the HTTP message header text plus a bunch of
Squid code objects.


>
> 3- If answers to questions above are both YES, can we deduce that we need
> *182* bytes in memory per object in the persistent cache on 64x system?
> [*182* = (14 * 1024 * 1024) / (1024 * 1024 / store_avg_object_size)]

If you want to re-do the calculations for your own proxy start with the
values from the cachemgr "mem" report.

To get the metadata size add the per-object sizes (first number column)
of HttpReply + MemObject + HttpHeaderEntry + all objects whose name
starts with HttpHdr* + StoreEntry + all objects whose name starts with
StoreMeta*.

The rest is harder. You need to do a scan of a disk cache separating the
message headers - both counting the number of items found and total size
of the headers processed. Multiplying the metadata size by the number of
objects in the cache and adding the total message header size.

You now have total index size and total cache size for a given cache.
Getting the N per GB from that should be easy and obvious.



NP: The mgr:mem "In Use" count of StoreEntry gives you approximately the
number of currently indexed objects. Though it does includes some
non-cacheable objects being replied to currently so not completely
accurate. You can use that to see how the index memory use compares to
the memory use for extra in-transit data.



> 4- Today the *store_avg_object_size* should be really greater than 13 KB.
> The mean object size I can see on my own cache is about 100 KB. Can anybody
> refer me to a website where I can find fresh information?

The value for your particular Squid can be found in the cachemgr "info"
report. It is listed as "Mean Object Size".

It varies between proxies, and is directly dependent on what your
particular cache settings are compared to the traffic that proxy sees.
So even two proxies receiving the same traffic might show very different
values and it is unlikely that any reference material you find by other
people will be anything more than a rough approximation.


For example; my test proxy caching ISP-type traffic, with a fair bit of
Facebook, YouTube etc. going through it:
"
        Mean Object Size: 106.08 KB
"

and a production CDN proxy in front of mostly Wordpress sites:
"
        Mean Object Size: 19.20 KB
"

Both with a 200 GB cache_dir and otherwise default cache settings.



>
> 5- If I'm completely on a wrong way, can anybody help me to find a formula
> that can help me to deduce the required RAM for a given HDD capacity (and
> vice versa).
>

Still the same one listed in the wiki page.

Though nowdays the 2^27 objects per cache_dir limitation is proving to
be far more restrictive than the RAM index size. So depending on your
"Mean Object Size" you may find yourself limited to only using 100 GB or
less of a TB HDD.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Alex Rousskov
On 07/12/2017 10:11 AM, Amos Jeffries wrote:

> On 12/07/17 22:31, bugreporter wrote:
>> Hi,
>>
>> Can anybody help me to confirm my understanding of the memory usage vs
>> the
>> persistent cache capacity? Below my understanding:
>>
>> According to http://wiki.squid-cache.org/SquidFaq/SquidMemory:
>>
>> 1- We need 14 MB of memory per 1 GB on disk for 64-bit Squid.The wiki is
>> there since I know squid (ie. i'm very old now). Is this information
>> still
>> valid?
>
> Yes. It is a rough estimate based on the size of code objects used to
> store each request message - they have not changed in at least the past
> 10 years.

For the record, the StoreEntry object has changed, but I do not know how
much those (minor!) changes affect the rough estimate. It is likely that
they do not.


>> 2- Is this assumption based on the default value of 13 KB for
>> *store_avg_object_size*?
>
> No.

Actually, the answer is probably "yes" or "yes, that or a similar mean
object size value".


> That avg object size is for the full object with payload.

... which is used to estimate how many cache_dir index entries Squid
will need to create for a cache_dir of a given size.


> The 10 or 14 MB is purely for the metadata necessary to index those
> cached objects. Which is the HTTP message header text plus a bunch of
> Squid code objects.

HTTP headers are not a part of the in-memory cache_dir index. StoreEntry
and LruNode (or equivalent) are pretty much the only structures we place
in the cache_dir index. A "bunch of Squid code objects" are not created
for that index (but get created during a cache hit and may remain in RAM
for some time after that).


> To get the metadata size add the per-object sizes (first number column)
> of HttpReply + MemObject + HttpHeaderEntry + all objects whose name
> starts with HttpHdr* + StoreEntry + all objects whose name starts with
> StoreMeta*.

AFAICT, StoreEntry and LruNode (or equivalent) are the only structures
created for the cache_dir index. All other structures are not relevant
in that scope.


> Though nowdays the 2^27 objects per cache_dir limitation is proving to
> be far more restrictive than the RAM index size.

Agreed, but YMMV.


> So depending on your
> "Mean Object Size" you may find yourself limited to only using 100 GB or
> less of a TB HDD.

... unless you use multiple cache_dirs per HDD.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
In reply to this post by Amos Jeffries
Hi Amos,

Thank you so much for your guidance. You have no idea how the key information you shared here is important to me. As your recommendation I'll do my own calculation based on my own cache ASAP.

But before doing my own calculation as you answered No to my second question, I just wanted to better understand the rough estimation on that wiki page. I think that I know the difference between the index space size in RAM, the cache_mem size (in RAM) and stored objects size in cache_dir(s) (on Disks). Now if the rough estimation is still valid and it is not based on the default value for store_avg_object_size (13 K), on what (stored on disk) average value is it based?

Regarding my question 4, regardless of the squid configuration do, you have an idea about the average size of an object on the Web? Suppose that we don't have a cache at all. The mean object size on my own cache is very close to yours : 101 KB. According to a small ISP the average object size that they can observe is about 850 KB... According to some statistics on some website the average object size on the Web is about 24 KB today. I'm lost...

Kind Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
In reply to this post by Amos Jeffries
Hi Amos,

When you say:

"The rest is harder. You need to do a scan of a disk cache separating the
message headers - both counting the number of items found and total size
of the headers processed. Multiplying the metadata size by the number of
objects in the cache and adding the total message header size."

What do you mean by message header in this context? Just the first line in each file or all HTTP headers (head of each file until the first \r\n)? If you mean all HTTP headers and if I correctly understood what Alex says, HTTP headers are not taken into the consideration in this context. Therefore should I eliminate them in the calculation?

Warm Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
In reply to this post by Alex Rousskov
Hi Alex,

Thank you for your contribution to this post. I see that you answered roughly YES to my question #2. I think that we agree... But what is this mean value size? Do you have any update to share please?

According to your answer, HTTP headers are not part of the in-memory index and we have to consider StoreEntry and LruNode (or equivalent) only. Can you please be more specific about what inputs of cachemgr "mem" report should we take into account?

Warm Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Alex Rousskov
On 07/13/2017 10:17 AM, bugreporter wrote:

> But what is this mean value size? Do you have any update to share please?

I do not know what value the wiki page authors used, but I suspect it
was close to 13K. I believe Amos has already suggested that you use the
actual cachable response size mean from your environment, and I agree
with that suggestion (although it is not trivial to follow). There is no
and cannot be an "authoritative" mean value that works well for all
environments.


> According to your answer, HTTP headers are not part of the in-memory index
> and we have to consider StoreEntry and LruNode (or equivalent) only. Can you
> please be more specific about what inputs of cachemgr "mem" report should we
> take into account?

Not without doing research. I also hesitate recommending mgr:mem output
for folks that cannot or do not want to find answers about it by
studying the associated code themselves -- it is just too easy to
misinterpret those low-level stats!

You may be able to figure it out on your own without relying on
low-level stats by disabling the memory cache, filling your disk cache
with, say, 1'000, 10'000, and then 100'000 identical or similar objects,
and measuring memory usage growth. You can then try to confirm your
findings by comparing them with the previously collected mgr:info,
mgr:storedir, and mgr:mem output for each stage of the experiment.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
This post was updated on .
Hi Alex,

By doing so I'll get a new (or the same) rough estimation which is not what I'm really looking for. Actually I need to have a formula based on the mean object size so I can periodically (with a cron) get the mean object size and with the help of that formula reconfigure Squid accordingly. The reconfiguration will be as follow:

- If the mean object size is too low compared to the RAM/HDD ratio then I can reduce the HDD usage by Squid (cache_dir ...  Low-Mbytes-Size ...). A reload of the new squid configuration should be sufficient. Isn't it? Or I'll need to restart Squid?

- If the mean object size is to high compared to the RAM/HDD ratio then I can fully use the HDD for Squid and do some optimizations (for instance as the RAM will not be fully used by the in-memory index I can use it for the cache_mem).

Kind Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Alex Rousskov
On 07/14/2017 02:11 AM, bugreporter wrote:

> By doing so I'll get a new (or the same) rough estimation which is not what
> I'm really looking for.

You will get an accurate-enough formula, which is what you should be
looking for.


> Actually I need to have a formula based on the mean object size

That is what I am trying to give you. Sorry if I was not explicit
enough. Idle Squid memory requirements can be computed using the
following formula:

    RAM used for HTTP caching purposes =
        RAM used by all cache indexes (cache_dirs and cache_mem) +
        cache_mem

where

    RAM used by a single cache index = C + v*n

where

    C is an unknown constant representing the size of in-RAM overhead
      of having a single (empty) cache (cache_dir or cache_mem).
      C depends on Squid build and configuration.
      C is normally a lot less than v*n so you might just ignore it.

    v is an unknown constant representing the size of in-RAM overhead
      of indexing a single cache object
      v depends on Squid build (e.g., 32 vs 64bit)
      v should be close to the sum of StoreEntry and LruNode sizes.

    n is the number of objects in the cache, which you can estimate by
      dividing the cache size by the mean cached object size.

The experiments I suggested can be used to estimate the C and v
constants required to compute the "RAM used by cache indexes" component.
You can measure/estimate/configure/control everything else in the formula.

The formula works well for large numbers of n, where various rounding
effects become negligible.

How you use this formula/model is up to you.

If your experiments prove the formula wrong, please discuss!


Thank you,

Alex.
P.S. Please note that a busy Squid also consumes memory for in-transit
transactions and other caches. If you know how much Squid consumes for
HTTP caching, then you can effectively measure other overheads, which
will also vary from one deployment environment to another.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Amos Jeffries
Administrator
In reply to this post by bugreporter
On 14/07/17 04:13, bugreporter wrote:

> Hi Amos,
>
> When you say:
>
> /"The rest is harder. You need to do a scan of a disk cache separating the
> message headers - both counting the number of items found and total size
> of the headers processed. Multiplying the metadata size by the number of
> objects in the cache and adding the total message header size."
> /
> What do you mean by message header in this context? Just the first line in
> each file or all HTTP headers (head of each file until the first \r\n)? If
> you mean all HTTP headers and if I correctly understood what Alex says, HTTP
> headers are not taken into the consideration in this context. Therefore
> should I eliminate them in the calculation?
>

Thats what I meant, but as Alex pointed out I was wrong. StoreEntry only
pulls a small bit of the cached meta stuff into the index, and that is
counted directly in the StoreEntry object size. The rest of it is only
pulled in when things are moved to the cache_mem space for delivery to a
client.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
In reply to this post by Alex Rousskov
Hi,

Thank you for this clarification. Can you please tell me what is the best method to measure the RAM used by Squid? Can I trust top and/or ps and look at the RSS? Or you suggest another method (maybe using the manager)?

For instance on a 64x when I start squid without cache_dir and a cache_mem of 0MB, the  "top" command gives me the following:
 
PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
10996 root        0 -20   70436   3504   1000 S  0.0  0.2   0:00.00 squid
10999 squid      0 -20  624472 144052   6152 S  0.0  7.0   0:03.34 squid


But the output of "squidclient -h localhost -p 3128 mgr:info' gives me this:

Resource usage for squid:
...
Maximum Resident Size: 576208 KB

Can you please give me advice about that?

Kind Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Amos Jeffries
Administrator
On 18/07/17 00:01, bugreporter wrote:

> Hi,
>
> Thank you for this clarification. Can you please tell me what is the best
> method to measure the RAM used by Squid? Can I trust *top* and/or *ps* and
> look at the RSS? Or you suggest another method (maybe using the manager)?
>
> For instance on a 64x when I start squid without cache_dir and a cache_mem
> of 0MB, the  "*top*" command gives me the following:
>    
> /PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
> 10996 root        0 -20   70436   3504   1000 S  0.0  0.2   0:00.00 squid
> 10999 squid      0 -20  624472 144052   6152 S  0.0  7.0   0:03.34 squid/
>
> But the output of "squidclient -h localhost -p 3128 mgr:info' gives me this:
>
> /Resource usage for squid:
> ...
> Maximum Resident Size: 576208 KB
> /
> Can you please give me advice about that?

Thats is maximum under the highest peak this Squid has apparently
encountered. The value comes directly from the getrusage() syscall,
Squid is not maintaining that value itself.

HTH
Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
Thank you Amos,

OK so how can I accurately measure the memory usage?

Kind Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Amos Jeffries
Administrator
On 18/07/17 02:56, bugreporter wrote:
> Thank you Amos,
>
> OK so how can I accurately measure the memory usage?
>

I don't have an answer to that one sorry.

I personally just use the top values.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
This post was updated on .
In reply to this post by Alex Rousskov
Hi Alex & Amos. Below results:

On a x64 machine:
v ~ 207 Bytes

On a x86 machine:
v ~ 116 Bytes

Warm Regards,
Bug Reporter Contributor OpenSource = Open-Minded
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

Alex Rousskov
On 07/18/2017 12:32 PM, bugreporter wrote:
> Hi Alex & Amos. Below results:
>
> On a x64 machine:
> v ~ 207 Bytes

For the record, sizeof(StoreEntry) + sizeof(LruNode) = 104 + 24 = 128
bytes (for Squid v5 on an x64 host).

If your results are correct, we cannot account for ~80 bytes, which is
~50 bytes too many to attribute to various index storage overheads IMO.
This is not important for you (you should use the numbers you got as
long as you trust them), but a developer should investigate where that
memory goes.


>> According to http://wiki.squid-cache.org/SquidFaq/SquidMemory:
>> We need 14 MB of memory per 1 GB on disk for 64-bit Squid


Assuming 13KB mean object size would give us another x64 data point:
v ~ 182


> On a x86 machine:
> v ~ 116


Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDD/RAM Capacity vs store_avg_object_size

bugreporter
Hi,

FYI I had the same object (an image) duplicated x1000, x10000, x30000, x60000, x100000, x130000, x160000 and finally x200000. The real size of my object was ~ 45 KB (48 KB for squid as in counts headers + fs structure I guess).

The growth was almost linear and values I posted here is an average.

Kind Regards,  
Bug Reporter Contributor OpenSource = Open-Minded
Loading...