ACL matches when it shouldn't

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ACL matches when it shouldn't

Vieri

Regarding the use of an external ACL I quickly implemented a perl script that "does the job", but it seems to be somewhat sluggish.

This is how it's configured in squid.conf:
external_acl_type bllookup ttl=86400 negative_ttl=86400 children-max=80 children-startup=10 children-idle=3 concurrency=8 %PROTO %DST %PORT %PATH /opt/custom/scripts/squid/ext_txt_blwl_acl.pl --categories=adv,aggressive,alcohol,anonvpn,automobile_bikes,automobile_boats,automobile_cars,automobile_planes,chat,costtraps,dating,drugs,dynamic,finance_insurance,finance_moneylending,finance_other,finance_realestate,finance_trading,fortunetelling,forum,gamble,hacking,hobby_cooking,hobby_games-misc,hobby_games-online,hobby_gardening,hobby_pets,homestyle,ibs,imagehosting,isp,jobsearch,military,models,movies,music,podcasts,politics,porn,radiotv,recreation_humor,recreation_martialarts,recreation_restaurants,recreation_sports,recreation_travel,recreation_wellness,redirector,religion,remotecontrol,ringtones,science_astronomy,science_chemistry,sex_education,sex_lingerie,shopping,socialnet,spyware,tracker,updatesites,urlshortener,violence,warez,weapons,webphone,webradio,webtv

I'd like to avoid the use of a DB if possible, but maybe someone here has an idea to share on flat file text searches.

Currently the dir structure of my blacklists is:

topdir
category1 ... categoryN
domains urls

So basically one example file to search in is topdir/category8/urls, etc.

The helper perl script contains this code to decide whether to block access or not:

foreach( @categories )
{
        chomp($s_urls = qx{grep -nwx '$uri_dst$uri_path' $cats_where/$_/urls | head -n 1 | cut -f1 -d:});

        if (length($s_urls) > 0) {
            if ($whitelist == 0) {
                $status = $cid." ERR message=\"URL ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
            } else {
                $status = $cid." ERR message=\"URL ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
            }
            next;
        }

        chomp($s_urls = qx{grep -nwx '$uri_dst' $cats_where/$_/domains | head -n 1 | cut -f1 -d:});

        if (length($s_urls) > 0) {
            if ($whitelist == 0) {
                $status = $cid." ERR message=\"Domain ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
            } else {
                $status = $cid." ERR message=\"Domain ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
            }
            next;
        }
}

There are currently 66 "categories" with around 50MB of text data in all.
So that's a lot to go through each time there's an HTTP request.
Apart from placing these blacklists on a ramdisk (currently on an M.2 SSD disk so I'm not sure I'll notice anything) what else can I try?
Should I reindex the lists and group them all alphabetically?
For instance should I process the lists in order to generate a dir structure as follows?

topdir
a b c d e f ... x y z 0 1 2 3 ... 7 8 9
domains urls

An example for a client requesting https://www.google.com/ would lead to searching only 2 files:
topdir/w/domains
topdir/w/urls

An example for a client requesting https://01.whatever.com/x would also lead to searching only 2 files:
topdir/0/domains
topdir/0/urls

An example for a client requesting https://8.8.8.8/xyz would also lead to searching only 2 files:
topdir/8/domains
topdir/8/urls

Any ideas or links to scripts that already prepare lists for this?

Thanks,

Vieri
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: ACL matches when it shouldn't

Marcus Kool
Of course this script is sluggish since it reads many category files and forks at least 3-6 times.

If you *really* want to implement this with a perl script, it should read all files at startup and the script does a lookup using perl data structures.

But I suggest to look at ufdbGuard which is a URL filter that is way faster and has all functionality that you need.

Marcus


On 2020-10-02 10:08, Vieri wrote:

> Regarding the use of an external ACL I quickly implemented a perl script that "does the job", but it seems to be somewhat sluggish.
>
> This is how it's configured in squid.conf:
> external_acl_type bllookup ttl=86400 negative_ttl=86400 children-max=80 children-startup=10 children-idle=3 concurrency=8 %PROTO %DST %PORT %PATH /opt/custom/scripts/squid/ext_txt_blwl_acl.pl --categories=adv,aggressive,alcohol,anonvpn,automobile_bikes,automobile_boats,automobile_cars,automobile_planes,chat,costtraps,dating,drugs,dynamic,finance_insurance,finance_moneylending,finance_other,finance_realestate,finance_trading,fortunetelling,forum,gamble,hacking,hobby_cooking,hobby_games-misc,hobby_games-online,hobby_gardening,hobby_pets,homestyle,ibs,imagehosting,isp,jobsearch,military,models,movies,music,podcasts,politics,porn,radiotv,recreation_humor,recreation_martialarts,recreation_restaurants,recreation_sports,recreation_travel,recreation_wellness,redirector,religion,remotecontrol,ringtones,science_astronomy,science_chemistry,sex_education,sex_lingerie,shopping,socialnet,spyware,tracker,updatesites,urlshortener,violence,warez,weapons,webphone,webradio,webtv
>
> I'd like to avoid the use of a DB if possible, but maybe someone here has an idea to share on flat file text searches.
>
> Currently the dir structure of my blacklists is:
>
> topdir
> category1 ... categoryN
> domains urls
>
> So basically one example file to search in is topdir/category8/urls, etc.
>
> The helper perl script contains this code to decide whether to block access or not:
>
> foreach( @categories )
> {
>          chomp($s_urls = qx{grep -nwx '$uri_dst$uri_path' $cats_where/$_/urls | head -n 1 | cut -f1 -d:});
>
>          if (length($s_urls) > 0) {
>              if ($whitelist == 0) {
>                  $status = $cid." ERR message=\"URL ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
>              } else {
>                  $status = $cid." ERR message=\"URL ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
>              }
>              next;
>          }
>
>          chomp($s_urls = qx{grep -nwx '$uri_dst' $cats_where/$_/domains | head -n 1 | cut -f1 -d:});
>
>          if (length($s_urls) > 0) {
>              if ($whitelist == 0) {
>                  $status = $cid." ERR message=\"Domain ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
>              } else {
>                  $status = $cid." ERR message=\"Domain ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
>              }
>              next;
>          }
> }
>
> There are currently 66 "categories" with around 50MB of text data in all.
> So that's a lot to go through each time there's an HTTP request.
> Apart from placing these blacklists on a ramdisk (currently on an M.2 SSD disk so I'm not sure I'll notice anything) what else can I try?
> Should I reindex the lists and group them all alphabetically?
> For instance should I process the lists in order to generate a dir structure as follows?
>
> topdir
> a b c d e f ... x y z 0 1 2 3 ... 7 8 9
> domains urls
>
> An example for a client requesting https://www.google.com/ would lead to searching only 2 files:
> topdir/w/domains
> topdir/w/urls
>
> An example for a client requesting https://01.whatever.com/x would also lead to searching only 2 files:
> topdir/0/domains
> topdir/0/urls
>
> An example for a client requesting https://8.8.8.8/xyz would also lead to searching only 2 files:
> topdir/8/domains
> topdir/8/urls
>
> Any ideas or links to scripts that already prepare lists for this?
>
> Thanks,
>
> Vieri
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users