Cache reference age for heap LRU/LFUDA and rock/aufs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Cache reference age for heap LRU/LFUDA and rock/aufs

Ivan Larionov
Hello!

Is it possible to get a metric similar to "LRU reference age" (or "LRU expiration") when using heap LRU/LFUDA and aufs/rock?

What we need to do is to figure out the age of the oldest least accessed object in the cache. Or the age of the last replaced object.

If my description is somehow unclear – we need to answer the question "How many days ago the oldest object which is not being accessed anymore has been put in the cache."

With aufs/lru we had "LRU reference age" or something like this in mgr:info report, but with currently used heap lru/lfuda and rock/aufs I don't see it there. SNMP metric also shows:

> SQUID-MIB::cacheCurrentLRUExpiration.0 = Timeticks: (0) 0:00:00.00

If you're wondering why would we need to know that – it's related to GDPR and removing data of closed customer's accounts. We need to make sure that we don't have any "not being accessed anymore" objects older than "data retention period" days.


Thanks!

--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Cache reference age for heap LRU/LFUDA and rock/aufs

Alex Rousskov
On 02/09/2018 06:42 PM, Ivan Larionov wrote:

> Hello!
>
> Is it possible to get a metric similar to "LRU reference age" (or "LRU
> expiration") when using heap LRU/LFUDA and aufs/rock?
>
> What we need to do is to figure out the age of the oldest least accessed
> object in the cache. Or the age of the last replaced object.
>
> If my description is somehow unclear – we need to answer the question
> "How many days ago the oldest object which is not being accessed anymore
> has been put in the cache."
>
> With aufs/lru we had "LRU reference age" or something like this in
> mgr:info report, but with currently used heap lru/lfuda and rock/aufs I
> don't see it there. SNMP metric also shows:
>
>> SQUID-MIB::cacheCurrentLRUExpiration.0 = Timeticks: (0) 0:00:00.00

I cannot answer your question for aufs, but please note that rock
cache_dirs do not support/have/use a configurable replacement policy:
Each incoming object is assigned a slot based on its key hash. With
modern rock code, it is possible to remove that limitation IIRC, but
nobody have done that.


> If you're wondering why would we need to know that – it's related to
> GDPR and removing data of closed customer's accounts. We need to make
> sure that we don't have any "not being accessed anymore" objects older
> than "data retention period" days.

If it is important to get this right, then I would not trust replacement
policy metadata with this: The corresponding code interfaces look
unreliable to me, and access counts/timestamps for a ufs-based cache_dir
are not updated across Squid restarts when the swap log is lost (at least).

I would instead configure Squid to prohibit serving hits that are too
old. That solution does not match your problem exactly, but it may be
good enough and should work a lot more reliably across all cache_dirs.
If there is no "age" ACL to use with the send_hit directive, then you
may need to add one.

    http://www.squid-cache.org/Doc/config/send_hit/

You may also be able to accomplish the same using refresh_pattern, but I
am a little worroed about various exceptional/special conditions
implemented on top of that directive. Others on this list may offer
better guidance in this area.


HTH,

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Cache reference age for heap LRU/LFUDA and rock/aufs

Ivan Larionov
On Fri, Feb 9, 2018 at 7:50 PM, Alex Rousskov <[hidden email]> wrote:

I cannot answer your question for aufs, but please note that rock
cache_dirs do not support/have/use a configurable replacement policy:
Each incoming object is assigned a slot based on its key hash. With
modern rock code, it is possible to remove that limitation IIRC, but
nobody have done that.

Yeah I figured this out from the source code and I'm extremely surprised by the fact that it was never mentioned in documentation. I think it will be a huge blocker in our squid 4 + SMP + rock migration plan.

So what does rock do when storage is full then?
 


> If you're wondering why would we need to know that – it's related to
> GDPR and removing data of closed customer's accounts. We need to make
> sure that we don't have any "not being accessed anymore" objects older
> than "data retention period" days.

If it is important to get this right, then I would not trust replacement
policy metadata with this: The corresponding code interfaces look
unreliable to me, and access counts/timestamps for a ufs-based cache_dir
are not updated across Squid restarts when the swap log is lost (at least).


It's actually fine, we never restart squid and if it restarted by any unexpected reason (host reboot, crash or w/e) we just replace the host.
 
I would instead configure Squid to prohibit serving hits that are too
old. That solution does not match your problem exactly, but it may be
good enough and should work a lot more reliably across all cache_dirs.
If there is no "age" ACL to use with the send_hit directive, then you
may need to add one.

    http://www.squid-cache.org/Doc/config/send_hit/

You may also be able to accomplish the same using refresh_pattern, but I
am a little worroed about various exceptional/special conditions
implemented on top of that directive. Others on this list may offer
better guidance in this area.


I was thinking about similar solution but this is exactly why I wasn't able to use it – there seems to be no acl suitable for such task.

We can always just replace the host every month or something like this but it'll mean starting with a cold cache every time which I wanted to avoid.

I found this debug option for heap which could probably help in understanding of approximate cache age but it doesn't work with rock because rock uses some "simple scan" policy.

> src/repl/heap/store_repl_heap.cc:        debugs(81, 3, "Heap age set to " << h->theHeap->age);


HTH,

Alex.



--
With best regards, Ivan Larionov.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Cache reference age for heap LRU/LFUDA and rock/aufs

Alex Rousskov
On 02/12/2018 04:25 PM, Ivan Larionov wrote:
> On Fri, Feb 9, 2018 at 7:50 PM, Alex Rousskov wrote:
>
>     please note that rock
>     cache_dirs do not support/have/use a configurable replacement policy:
>     Each incoming object is assigned a slot based on its key hash. With
>     modern rock code, it is possible to remove that limitation IIRC, but
>     nobody have done that.

> So what does rock do when storage is full then?

Becoming full is not a special condition for rock cache_dirs. In other
words, a rock cache_dir does not do anything special when it becomes (or
is) full.

Roughly speaking, when a rock cache_dir needs to store a new entry, it
computes its starting location by hashing the entry URL and starts
storing the entry at that location, overwriting any cached entry that
used the same starting location. Same for the shared memory cache. See
Ipc::StoreMap::openForWriting().

Rock is meant/optimized for large caches. The probability of overwriting
a valuable entry in a large cache is small -- most cached entries are
not valuable because they will never be requested again.

Needless to say, there are cases where such crude approach is harmful
(e.g., two popular entries competing for the same unlucky location and
overwriting each other). As I said, modern SMP caching code may have
enough bells and whistles to support such cases better, but the required
changes are still far from trivial, and nobody has contributed or
sponsored that improvement (yet?).


>     If it is important to get this right, then I would not trust replacement
>     policy metadata with this: The corresponding code interfaces look
>     unreliable to me, and access counts/timestamps for a ufs-based cache_dir
>     are not updated across Squid restarts when the swap log is lost (at
>     least).

> It's actually fine, we never restart squid and if it restarted by any
> unexpected reason (host reboot, crash or w/e) we just replace the host.

FWIW, please note that my "unreliable" remark applies to Squids that are
never restarted. YMMV.


>     If there is no "age" ACL to use with the send_hit directive, then you
>     may need to add one.
>
> I was thinking about similar solution but this is exactly why I wasn't
> able to use it – there seems to be no acl suitable for such task.

IMO, treating Squid feature set as a constant outside of your control
means ignoring 50% of Squid advantages (while suffering from 100% of its
drawbacks). One can add the missing pieces:

https://wiki.squid-cache.org/SquidFaq/AboutSquid#How_to_add_a_new_Squid_feature.2C_enhance.2C_of_fix_something.3F


HTH,

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users