How to create a simple whitelist using regexes?

classic Classic list List threaded Threaded
8 messages Options
RB
Reply | Threaded
Open this post in threaded view
|

How to create a simple whitelist using regexes?

RB
Hi everyone,

I'm trying to deny all urls except for only whitelisted regular expressions. I have only this regular expression in my file "squid_sites.txt"


My "squid.conf"

debug_options 28,7

###
### Global settings define
###

http_port 3128

###
### Authorization rules define
###

###
### Networks define
###

acl localnet src 10.5.0.0/1
acl localnet src 172.16.0.0/16
acl localnet src fc00::/7
acl localnet src fe80::/10

###
### Ports define
###

acl SSL_ports port 443          # https
acl SSL_ports port 22           # SSH
acl Safe_ports port 80          # http
acl Safe_ports port 443         # https
acl Safe_ports port 22          # SSH

acl purge method PURGE

acl CONNECT method CONNECT

acl bastion src 10.5.0.0/1
acl whitelist url_regex "/vagrant/squid_sites.txt"

###
### Rules define
###

http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports

http_access allow bastion whitelist
http_access deny bastion all

# http_access deny all

###
### Secondary global settings define
###


# icp_access allow localnet
# icp_access deny all
#
# htcp_access allow localnet
# htcp_access deny all

# Add any of your own refresh_pattern entries above these.
access_log /var/log/squid3/access.log squid
cache_log /var/log/squid3/cache.log squid
cache_store_log /var/log/squid3/store.log squid

refresh_pattern      ^ftp:      1440  20%  10080
refresh_pattern     ^gopher:      1440  0%  1440
refresh_pattern      -i (/cgi-bin/|\?)    0  0%  0
refresh_pattern     (Release|Package(.gz)*)$  0  20%  2880

coredump_dir /var/spool/squid3
maximum_object_size 1024 MB
cache_mem 2048 MB

I tried enabling debugging and tailing /var/log/squid3/cache.log but my curl statement keeps matching "all".

$ curl -sSL --proxy localhost:3128 -D - "https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep Squid
X-Squid-Error: ERR_ACCESS_DENIED 0

Any ideas what I'm doing wrong?

Thank you.

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

Matus UHLAR - fantomas
KOn 15.10.18 01:04, RB wrote:
>I'm trying to deny all urls except for only whitelisted regular
>expressions. I have only this regular expression in my file
>"squid_sites.txt"
>
>^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*

are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?


>acl bastion src 10.5.0.0/1
>acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]

>http_access allow manager localhost
>http_access deny manager
>http_access deny !Safe_ports
>http_access allow localhost
>http_access allow purge localhost
>http_access deny purge
>http_access deny CONNECT !SSL_ports
>
>http_access allow bastion whitelist
>http_access deny bastion all

>I tried enabling debugging and tailing /var/log/squid3/cache.log but my
>curl statement keeps matching "all".

of course it matches all, everything should match "all".

I more wonder why doesn't it match "http_access allow localhost"

>$ curl -sSL --proxy localhost:3128 -D - "
>https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
>Squid
>X-Squid-Error: ERR_ACCESS_DENIED 0

>Any ideas what I'm doing wrong?

have you reloaded squid config after changing it?
Did squid confirm it?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
RB
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

RB
Hi Matus,

Thanks for responding so quickly. I uploaded my configurations here if that is more helpful: https://bit.ly/2NF4zNb

The config that I previously shared is called squid_corp.conf. I also noticed that if I don't use regular expressions and instead use domains, it works correctly:

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org

Every time my squid.conf or my squid_sites.txt is modified, I restart the squid service

sudo service squid3 restart

Then I use curl to test and now the url works. 

$ curl -sSL --proxy localhost:3128 -D - https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1
HTTP/1.1 200 Connection established

HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8

But this does not allow me to get more granular. I can only allow all subdomains and paths for the domain squid-cache.org but I'm unable to only allow the regular expressions if I put them inline or put them in squid_sites.txt.

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*

If I put them inline like I have above, when I restarted squid it says the following

2018/10/15 14:54:48 kid1| strtokFile: .*squid-cache.org/SquidFaq/SquidAcl.* not found

If I put the expressions in the squid_sites.txt the above "not found" message isn't shown and this is the debug output in /var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).

2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8 matched=1 async=0 finished=0
2018/10/15 15:05:45.083 kid1| Acl.cc(336) matches: ACLList::matches: checking whitelist
2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches: ACL::checklistMatches: checking 'whitelist'
2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^<a href="https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*">https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is false

So it's failing the regular expression check. If I use grep to verify if the regex works, it does.


> are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?

Ah interesting. Are you saying that my https connections will always fail unless I use ssl_bump to decrypt https to http connections? How would this work correctly in production? Does squid proxy only block urls if it detects http? How do you configure ssl_bump to work in this case? and is that viable in production?

> of course it matches all, everything should match "all".
> I more wonder why doesn't it match "http_access allow localhost"

have you reloaded squid config after changing it?
> Did squid confirm it?

Would you have an example of one entire config file that would work to whitelist an http/https url using a regular expression?

Best,


On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas <[hidden email]> wrote:
KOn 15.10.18 01:04, RB wrote:
>I'm trying to deny all urls except for only whitelisted regular
>expressions. I have only this regular expression in my file
>"squid_sites.txt"
>
>^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*

are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?


>acl bastion src 10.5.0.0/1
>acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
>http_access allow manager localhost
>http_access deny manager
>http_access deny !Safe_ports
>http_access allow localhost
>http_access allow purge localhost
>http_access deny purge
>http_access deny CONNECT !SSL_ports
>
>http_access allow bastion whitelist
>http_access deny bastion all

>I tried enabling debugging and tailing /var/log/squid3/cache.log but my
>curl statement keeps matching "all".

of course it matches all, everything should match "all".

I more wonder why doesn't it match "http_access allow localhost"

>$ curl -sSL --proxy localhost:3128 -D - "
>https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
>Squid
>X-Squid-Error: ERR_ACCESS_DENIED 0

>Any ideas what I'm doing wrong?

have you reloaded squid config after changing it?
Did squid confirm it?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
RB
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

RB
I think I know what the issue is which can give us a clue to what is going on.

2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 0

The above seems to be applying the regex to "wiki.squid-cache.org:443" instead of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the regex ".*squid-cache.org.*" to my list of regular expressions and now I see this.

2018/10/15 15:16:03.641 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:16:03.641 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https?://[^/]+/<a href="http://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*">wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)'
2018/10/15 15:16:03.641 kid1| RegexData.cc(93) match: aclRegexData::match: match '(^https?://[^/]+/<a href="http://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*">wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)' found in 'wiki.squid-cache.org:443'
2018/10/15 15:16:03.641 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 1

Any idea why url_regex wouldn't try to match the full url and instead only matches on the subdomain, host domain, and port? 

The Squid FAQ says the following:

url_regex: URL regular expression pattern matching
urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname

with this example given

acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$

This seems to be the case between 3.3.8 (default on ubuntu 14.04) and 3.5.12 (default on ubuntu 16.04).

Is there another configuration that forces url_regex to match the entire url? or should I use a different acl type?

Best,

On Mon, Oct 15, 2018 at 11:11 AM RB <[hidden email]> wrote:
Hi Matus,

Thanks for responding so quickly. I uploaded my configurations here if that is more helpful: https://bit.ly/2NF4zNb

The config that I previously shared is called squid_corp.conf. I also noticed that if I don't use regular expressions and instead use domains, it works correctly:

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org

Every time my squid.conf or my squid_sites.txt is modified, I restart the squid service

sudo service squid3 restart

Then I use curl to test and now the url works. 

$ curl -sSL --proxy localhost:3128 -D - https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1
HTTP/1.1 200 Connection established

HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8

But this does not allow me to get more granular. I can only allow all subdomains and paths for the domain squid-cache.org but I'm unable to only allow the regular expressions if I put them inline or put them in squid_sites.txt.

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*

If I put them inline like I have above, when I restarted squid it says the following

2018/10/15 14:54:48 kid1| strtokFile: .*squid-cache.org/SquidFaq/SquidAcl.* not found

If I put the expressions in the squid_sites.txt the above "not found" message isn't shown and this is the debug output in /var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).

2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8 matched=1 async=0 finished=0
2018/10/15 15:05:45.083 kid1| Acl.cc(336) matches: ACLList::matches: checking whitelist
2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches: ACL::checklistMatches: checking 'whitelist'
2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is false

So it's failing the regular expression check. If I use grep to verify if the regex works, it does.


> are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?

Ah interesting. Are you saying that my https connections will always fail unless I use ssl_bump to decrypt https to http connections? How would this work correctly in production? Does squid proxy only block urls if it detects http? How do you configure ssl_bump to work in this case? and is that viable in production?

> of course it matches all, everything should match "all".
> I more wonder why doesn't it match "http_access allow localhost"

have you reloaded squid config after changing it?
> Did squid confirm it?

Would you have an example of one entire config file that would work to whitelist an http/https url using a regular expression?

Best,


On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas <[hidden email]> wrote:
KOn 15.10.18 01:04, RB wrote:
>I'm trying to deny all urls except for only whitelisted regular
>expressions. I have only this regular expression in my file
>"squid_sites.txt"
>
>^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*

are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?


>acl bastion src 10.5.0.0/1
>acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
>http_access allow manager localhost
>http_access deny manager
>http_access deny !Safe_ports
>http_access allow localhost
>http_access allow purge localhost
>http_access deny purge
>http_access deny CONNECT !SSL_ports
>
>http_access allow bastion whitelist
>http_access deny bastion all

>I tried enabling debugging and tailing /var/log/squid3/cache.log but my
>curl statement keeps matching "all".

of course it matches all, everything should match "all".

I more wonder why doesn't it match "http_access allow localhost"

>$ curl -sSL --proxy localhost:3128 -D - "
>https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
>Squid
>X-Squid-Error: ERR_ACCESS_DENIED 0

>Any ideas what I'm doing wrong?

have you reloaded squid config after changing it?
Did squid confirm it?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
RB
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

RB
Hi again...

After some more research it looks like squid only has access to the url domain if it's HTTPS and the only way to get the url path and query string is to use ssl_bump to decrypt https so squid can see url path and query arguments.

To use ssl_bump, I have to compile the code from source with --enable-ssl, create a certificate, and add it to the chain of certs to every other vm that proxies through squid, then squid can decrypt the https urls to see paths and query args and finally apply the regex to those urls in order to only allow explicit regex urls.

Is this correct?

On Mon, Oct 15, 2018 at 11:56 AM RB <[hidden email]> wrote:
I think I know what the issue is which can give us a clue to what is going on.

2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 0

The above seems to be applying the regex to "wiki.squid-cache.org:443" instead of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the regex ".*squid-cache.org.*" to my list of regular expressions and now I see this.

2018/10/15 15:16:03.641 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:16:03.641 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)'
2018/10/15 15:16:03.641 kid1| RegexData.cc(93) match: aclRegexData::match: match '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)' found in 'wiki.squid-cache.org:443'
2018/10/15 15:16:03.641 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 1

Any idea why url_regex wouldn't try to match the full url and instead only matches on the subdomain, host domain, and port? 

The Squid FAQ says the following:

url_regex: URL regular expression pattern matching
urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname

with this example given

acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$

This seems to be the case between 3.3.8 (default on ubuntu 14.04) and 3.5.12 (default on ubuntu 16.04).

Is there another configuration that forces url_regex to match the entire url? or should I use a different acl type?

Best,

On Mon, Oct 15, 2018 at 11:11 AM RB <[hidden email]> wrote:
Hi Matus,

Thanks for responding so quickly. I uploaded my configurations here if that is more helpful: https://bit.ly/2NF4zNb

The config that I previously shared is called squid_corp.conf. I also noticed that if I don't use regular expressions and instead use domains, it works correctly:

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org

Every time my squid.conf or my squid_sites.txt is modified, I restart the squid service

sudo service squid3 restart

Then I use curl to test and now the url works. 

$ curl -sSL --proxy localhost:3128 -D - https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1
HTTP/1.1 200 Connection established

HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8

But this does not allow me to get more granular. I can only allow all subdomains and paths for the domain squid-cache.org but I'm unable to only allow the regular expressions if I put them inline or put them in squid_sites.txt.

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*

If I put them inline like I have above, when I restarted squid it says the following

2018/10/15 14:54:48 kid1| strtokFile: .*squid-cache.org/SquidFaq/SquidAcl.* not found

If I put the expressions in the squid_sites.txt the above "not found" message isn't shown and this is the debug output in /var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).

2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8 matched=1 async=0 finished=0
2018/10/15 15:05:45.083 kid1| Acl.cc(336) matches: ACLList::matches: checking whitelist
2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches: ACL::checklistMatches: checking 'whitelist'
2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is false

So it's failing the regular expression check. If I use grep to verify if the regex works, it does.


> are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?

Ah interesting. Are you saying that my https connections will always fail unless I use ssl_bump to decrypt https to http connections? How would this work correctly in production? Does squid proxy only block urls if it detects http? How do you configure ssl_bump to work in this case? and is that viable in production?

> of course it matches all, everything should match "all".
> I more wonder why doesn't it match "http_access allow localhost"

have you reloaded squid config after changing it?
> Did squid confirm it?

Would you have an example of one entire config file that would work to whitelist an http/https url using a regular expression?

Best,


On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas <[hidden email]> wrote:
KOn 15.10.18 01:04, RB wrote:
>I'm trying to deny all urls except for only whitelisted regular
>expressions. I have only this regular expression in my file
>"squid_sites.txt"
>
>^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*

are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?


>acl bastion src 10.5.0.0/1
>acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
>http_access allow manager localhost
>http_access deny manager
>http_access deny !Safe_ports
>http_access allow localhost
>http_access allow purge localhost
>http_access deny purge
>http_access deny CONNECT !SSL_ports
>
>http_access allow bastion whitelist
>http_access deny bastion all

>I tried enabling debugging and tailing /var/log/squid3/cache.log but my
>curl statement keeps matching "all".

of course it matches all, everything should match "all".

I more wonder why doesn't it match "http_access allow localhost"

>$ curl -sSL --proxy localhost:3128 -D - "
>https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
>Squid
>X-Squid-Error: ERR_ACCESS_DENIED 0

>Any ideas what I'm doing wrong?

have you reloaded squid config after changing it?
Did squid confirm it?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

Alex Rousskov
On 10/15/2018 10:48 AM, RB wrote:

> After some more research it looks like squid only has access to the url
> domain if it's HTTPS and the only way to get the url path and query
> string is to use ssl_bump to decrypt https so squid can see url path and
> query arguments.

Replace "url domain" with "service name". In many cases, they are about
the same today, but there is a trend for SNI values to migrate from
identifying specific sites (e.g., foo.example.com) to identifying broad
services (e.g., everything.example.com), making SNIs increasingly imprecise.

Please note that you cannot bump sites that pin their certificates or
use other measures that prevent bumping. Long-term, most sites will
probably fall into that category by switching to TLS v1.3 and hiding
their true names behind essentially fake/generic SNIs.


> To use ssl_bump, I have to compile the code from source with
> --enable-ssl, create a certificate, and add it to the chain of certs to
> every other vm that proxies through squid, then squid can decrypt the
> https urls to see paths and query args and finally apply the regex to
> those urls in order to only allow explicit regex urls.
>
> Is this correct?

Replace "add it to the chain of certs" with "add it to the set of
trusted CA certificates". CA certificates are not chained... And, yes,
every client (every "vm" in your case?) that proxies through Squid would
have to trust your CA certificate.

The above sounds correct (and will be painful) if your clients cannot
send unencrypted requests for https:... URLs to Squid. On the other
hand, if your clients can send unencrypted requests for https:... URLs
to Squid, then no bumping is necessary at all. Please note that those
unencrypted requests may be inside an encrypted TLS connection -- they
are not necessarily insecure or unsafe. Unfortunately, popular browsers
do _not_ support sending unencrypted requests for https:... URLs to proxies.


HTH,

Alex.


> On Mon, Oct 15, 2018 at 11:56 AM RB wrote:
>
>     I think I know what the issue is which can give us a clue to what is
>     going on.
>
>         2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match:
>         aclRegexData::match: checking 'wiki.squid-cache.org:443
>         <http://wiki.squid-cache.org:443/>'
>         2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match:
>         aclRegexData::match: looking for
>         '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
>         <https://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org/SquidFaq/SquidAcl.*>)'
>         2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches:
>         ACL::ChecklistMatches: result for 'whitelist' is 0
>
>     The above seems to be applying the regex to
>     "wiki.squid-cache.org:443 <http://wiki.squid-cache.org:443>" instead
>     of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the
>     regex ".*squid-cache.org.*" to my list of regular expressions and
>     now I see this.
>
>         2018/10/15 15:16:03.641 kid1| RegexData.cc(71) match:
>         aclRegexData::match: checking 'wiki.squid-cache.org:443
>         <http://wiki.squid-cache.org:443>'
>         2018/10/15 15:16:03.641 kid1| RegexData.cc(82) match:
>         aclRegexData::match: looking for
>         '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
>         <http://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org.*>)'
>         2018/10/15 15:16:03.641 kid1| RegexData.cc(93) match:
>         aclRegexData::match: match
>         '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
>         <http://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org.*>)'
>         found in 'wiki.squid-cache.org:443
>         <http://wiki.squid-cache.org:443>'
>         2018/10/15 15:16:03.641 kid1| Acl.cc(321) checklistMatches:
>         ACL::ChecklistMatches: result for 'whitelist' is 1
>
>
>     Any idea why url_regex wouldn't try to match the full url and
>     instead only matches on the subdomain, host domain, and port? 
>
>     The Squid FAQ <https://wiki.squid-cache.org/SquidFaq/SquidAcl> says
>     the following:
>
>         *url_regex*: URL regular expression pattern matching
>         *urlpath_regex*: URL-path regular expression pattern matching,
>         leaves out the protocol and hostname
>
>
>     with this example given
>
>         acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
>
>
>     This seems to be the case between 3.3.8 (default on ubuntu 14.04)
>     and 3.5.12 (default on ubuntu 16.04).
>
>     Is there another configuration that forces url_regex to match the
>     entire url? or should I use a different acl type?
>
>     Best,
>
>     On Mon, Oct 15, 2018 at 11:11 AM RB <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hi Matus,
>
>         Thanks for responding so quickly. I uploaded my configurations
>         here if that is more helpful: https://bit.ly/2NF4zNb
>
>         The config that I previously shared is called squid_corp.conf. I
>         also noticed that if I don't use regular expressions and instead
>         use domains, it works correctly:
>
>             # acl whitelist url_regex "/vagrant/squid_sites.txt"
>             acl whitelist url_regex .squid-cache.org
>             <http://squid-cache.org>
>
>
>         Every time my squid.conf or my squid_sites.txt is modified, I
>         restart the squid service
>
>             sudo service squid3 restart
>
>
>         Then I use curl to test and now the url works. 
>
>             $ curl -sSL --proxy localhost:3128 -D -
>             https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
>             HTTP/1.1 200 Connection established
>
>             HTTP/1.1 200 OK
>             Date: Mon, 15 Oct 2018 14:47:33 GMT
>             Server: Apache/2.4.7 (Ubuntu)
>             Vary: Cookie,User-Agent,Accept-Encoding
>             Content-Length: 101912
>             Cache-Control: max-age=3600
>             Expires: Mon, 15 Oct 2018 15:47:33 GMT
>             Content-Type: text/html; charset=utf-8
>
>
>         But this does not allow me to get more granular. I can only
>         allow all subdomains and paths for the domain squid-cache.org
>         <http://squid-cache.org> but I'm unable to only allow the
>         regular expressions if I put them inline or put them in
>         squid_sites.txt.
>
>             # acl whitelist url_regex "/vagrant/squid_sites.txt"
>             acl whitelist url_regex
>             ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
>             acl whitelist url_regex
>             .*squid-cache.org/SquidFaq/SquidAcl.*
>             <http://squid-cache.org/SquidFaq/SquidAcl.*>
>
>
>         If I put them inline like I have above, when I restarted squid
>         it says the following
>
>             2018/10/15 14:54:48 kid1| strtokFile:
>             .*squid-cache.org/SquidFaq/SquidAcl.*
>             <http://squid-cache.org/SquidFaq/SquidAcl.*> not found
>
>
>         If I put the expressions in the squid_sites.txt the above "not
>         found" message isn't shown and this is the debug output in
>         /var/log/squid3/cache.log (full
>         output https://pastebin.com/NVwRxVmQ).
>
>             2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode:
>             0x7fb0068da2b8 matched=1 async=0 finished=0
>             2018/10/15 15:05:45.083 kid1| Acl.cc(336) matches:
>             ACLList::matches: checking whitelist
>             2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches:
>             ACL::checklistMatches: checking 'whitelist'
>             2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match:
>             aclRegexData::match: checking 'wiki.squid-cache.org:443
>             <http://wiki.squid-cache.org:443>'
>             2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match:
>             aclRegexData::match: looking for
>             '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
>             <https://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org/SquidFaq/SquidAcl.*>)'
>             2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches:
>             ACL::ChecklistMatches: result for 'whitelist' is 0
>             2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist
>             mismatched.
>             2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist
>             result is false
>
>
>         So it's failing the regular expression check. If I use grep to
>         verify if the regex works, it does.
>
>             $ echo https://wiki.squid-cache.org/SquidFaq/SquidAcl | grep
>             "^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*"
>             https://wiki.squid-cache.org/SquidFaq/SquidAcl
>
>
>         > are you aware that you can only see CONNECT in https requests, unless using
>         ssl_bump?
>
>         Ah interesting. Are you saying that my https connections will
>         always fail unless I use ssl_bump to decrypt https to http
>         connections? How would this work correctly in production? Does
>         squid proxy only block urls if it detects http? How do you
>         configure ssl_bump to work in this case? and is that viable in
>         production?
>
>         > of course it matches all, everything should match "all".
>         > I more wonder why doesn't it match "http_access allow localhost"
>
>         > have you reloaded squid config after changing it?
>         > Did squid confirm it?
>
>         Would you have an example of one entire config file that would
>         work to whitelist an http/https url using a regular expression?
>
>         Best,
>
>
>         On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>             KOn 15.10.18 01:04, RB wrote:
>             >I'm trying to deny all urls except for only whitelisted regular
>             >expressions. I have only this regular expression in my file
>             >"squid_sites.txt"
>             >
>             >^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
>
>             are you aware that you can only see CONNECT in https
>             requests, unless using
>             ssl_bump?
>
>
>             >acl bastion src 10.5.0.0/1 <http://10.5.0.0/1>
>             >acl whitelist url_regex "/vagrant/squid_sites.txt"
>             [...]
>             >http_access allow manager localhost
>             >http_access deny manager
>             >http_access deny !Safe_ports
>             >http_access allow localhost
>             >http_access allow purge localhost
>             >http_access deny purge
>             >http_access deny CONNECT !SSL_ports
>             >
>             >http_access allow bastion whitelist
>             >http_access deny bastion all
>
>             >I tried enabling debugging and tailing
>             /var/log/squid3/cache.log but my
>             >curl statement keeps matching "all".
>
>             of course it matches all, everything should match "all".
>
>             I more wonder why doesn't it match "http_access allow localhost"
>
>             >$ curl -sSL --proxy localhost:3128 -D - "
>             >https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o
>             /dev/null 2>&1 | grep
>             >Squid
>             >X-Squid-Error: ERR_ACCESS_DENIED 0
>
>             >Any ideas what I'm doing wrong?
>
>             have you reloaded squid config after changing it?
>             Did squid confirm it?
>
>             --
>             Matus UHLAR - fantomas, [hidden email]
>             <mailto:[hidden email]> ; http://www.fantomas.sk/
>             Warning: I wish NOT to receive e-mail advertising to this
>             address.
>             Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek
>             reklamnu postu.
>             It's now safe to throw off your computer.
>             _______________________________________________
>             squid-users mailing list
>             [hidden email]
>             <mailto:[hidden email]>
>             http://lists.squid-cache.org/listinfo/squid-users
>
>
>
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

Matus UHLAR - fantomas
In reply to this post by RB
On 15.10.18 12:48, RB wrote:
>After some more research it looks like squid only has access to the url
>domain if it's HTTPS and the only way to get the url path and query string
>is to use ssl_bump to decrypt https so squid can see url path and query
>arguments.

this is what I wrote before. Looking at it now, I should have explained more
deeply....

>>> > are you aware that you can only see CONNECT in https requests, unless
>>> > using ssl_bump?

>To use ssl_bump, I have to compile the code from source with --enable-ssl,
>create a certificate, and add it to the chain of certs to every other vm
>that proxies through squid, then squid can decrypt the https urls to see
>paths and query args and finally apply the regex to those urls in order to
>only allow explicit regex urls.
>
>Is this correct?

Alex has explained already.

I would like to note that the whole purpose of SSL encription in HTTPS is to
deny anyone between client and server to see what is the client accessing.
That includes your proxy.

And we often see complaints about SSL bump not working because different
clients expect certificates signed by their certificate autorities, not by
yours.

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: How to create a simple whitelist using regexes?

Amos Jeffries
Administrator
In reply to this post by RB
In addition to what Matus and Alex have already said about your problem,
you do not appear to understand regex patterns properly.


On 16/10/18 4:11 AM, RB wrote:

> Hi Matus,
>
> Thanks for responding so quickly. I uploaded my configurations here if
> that is more helpful: https://bit.ly/2NF4zNb
>
> The config that I previously shared is called squid_corp.conf. I also
> noticed that if I don't use regular expressions and instead use domains,
> it works correctly:
>
>     # acl whitelist url_regex "/vagrant/squid_sites.txt"
>     acl whitelist url_regex .squid-cache.org

This is still a regex. The ACL type is "url_regex" which makes the
string a regex - no matter what it looks like to your human eyes. To
Squid it is a regex.

It will match things like http://example.com/sZsquid-cacheXORG just
easily as any sub-domain of squid-cache.org. For example any traffic
injecting our squid-cache.org domain into their path or query-string.



>
> Every time my squid.conf or my squid_sites.txt is modified, I restart
> the squid service
>
>     sudo service squid3 restart
>

If Squid does not accept the config file it will not necessarily restart.

You should always run "squid -k parse" or "squid3 -k parse" to check the
config before attempting a restart.


The old Debian sysV init scripts had some protections that would protect
you from problems. But the newer systemd "service" systems are not able
to do that in a nice way. The habit is a good one to get into anyway.


>
> Then I use curl to test and now the url works. 
>
>     $ curl -sSL --proxy localhost:3128 -D -
>     https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
>     HTTP/1.1 200 Connection established
>
>     HTTP/1.1 200 OK
>     Date: Mon, 15 Oct 2018 14:47:33 GMT
>     Server: Apache/2.4.7 (Ubuntu)
>     Vary: Cookie,User-Agent,Accept-Encoding
>     Content-Length: 101912
>     Cache-Control: max-age=3600
>     Expires: Mon, 15 Oct 2018 15:47:33 GMT
>     Content-Type: text/html; charset=utf-8
>
>
> But this does not allow me to get more granular. I can only allow all
> subdomains and paths for the domain squid-cache.org
> <http://squid-cache.org> but I'm unable to only allow the regular
> expressions if I put them inline or put them in squid_sites.txt.
>
>     # acl whitelist url_regex "/vagrant/squid_sites.txt"
>     acl whitelist url_regex
>     ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
>     acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*

Any regex pattern that lacks the beginning (^) and ending ($) anchor
symbols is always a match against *anywhere* in the input string.

So starting it with an optional prefix (.* or .?) or ending it with an
optional suffix (.* or .?) is pointless and confusing.


Notice how the pattern Squid is actually using lacks these prefix/suffix
parts or your patterns:

>     aclRegexData::match: looking for
>     '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'


>
>> are you aware that you can only see CONNECT in https requests, unless using
> ssl_bump?
>
> Ah interesting. Are you saying that my https connections will always
> fail

They will always fail to match your current regex, because your current
regex contain characters which are only ever existing in path portions
of URLs (note the *L*). Never in a CONNECT message URI (note the *I*)
which never contains any path portion.


> unless I use ssl_bump to decrypt https to http connections? How
> would this work correctly in production? Does squid proxy only block
> urls if it detects http? How do you configure ssl_bump to work in this
> case? and is that viable in production?

SSL-Bump is to take the CONNECT tunnel data/payload portion and
_attempt_ decrypt any TLS inside. *If* the tunnel contains HTTPS traffic
(not guaranteed) that is where the full https:// ... URLs are found.

Matus and Alex have already mentioned the issues with that so I wont
cover it again.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users