Simple REGEX not working...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple REGEX not working...

David A. Gershman
Hello,

I have the following in my config file:

    acl user_allowed url_regex ^<a class="moz-txt-link-freetext" href="https://example\.com/">https://example\.com/

but surfing to that site fails (authentication works fine).  My ultimate goal is to have an RE comparable to the PCRE of:

    ^https?:\/\/.*?example\.com\/

While the PCRE works just fine in other tools (my own scripts, online, etc.), I was unable to get it to work within Squid3.  As I stripped away pieces of the RE in the config file, the only RE which seemed to work was:

    example\.com

...not even having the ending '/'.  However, this obviously does not meet my needs.

I'm on Debian 10 and am unable to determine which RE library Debian compiled Squid3 against (I've got a Tweet out to them to see if they can point me in the right direction).

Ultimately, I would like to get Squid to use PCREs.

Ideas?

Thanks!

--David

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Simple REGEX not working...

David A. Gershman
Hello again,

After further testing, the looks like the only thing being regex'd against is the domain name.  I shrunk the RE down to just:

    acl user_allowed url_regex http  # nothing more, just 'http'

and it still failed!!!  It's as if the "whole url" (claimed by the docs) is not being compared against.  I'm just posting this here as an FYI...no solution has been found. :(

--David

On 7/22/20 7:22 PM, David A. Gershman wrote:
Hello,

I have the following in my config file:

    acl user_allowed url_regex ^<a class="moz-txt-link-freetext" href="https://example\.com/" moz-do-not-send="true">https://example\.com/

but surfing to that site fails (authentication works fine).  My ultimate goal is to have an RE comparable to the PCRE of:

    ^https?:\/\/.*?example\.com\/

While the PCRE works just fine in other tools (my own scripts, online, etc.), I was unable to get it to work within Squid3.  As I stripped away pieces of the RE in the config file, the only RE which seemed to work was:

    example\.com

...not even having the ending '/'.  However, this obviously does not meet my needs.

I'm on Debian 10 and am unable to determine which RE library Debian compiled Squid3 against (I've got a Tweet out to them to see if they can point me in the right direction).

Ultimately, I would like to get Squid to use PCREs.

Ideas?

Thanks!

--David

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Simple REGEX not working...

Amos Jeffries
Administrator
On 23/07/20 3:27 pm, David A. Gershman wrote:

> Hello again,
>
> After further testing, the looks like the only thing being regex'd
> against is the domain name.  I shrunk the RE down to just:
>
>     acl user_allowed url_regex http  # nothing more, just 'http'
>
> and it /*still*/ failed!!!  It's as if the "whole url" (claimed by the
> docs) is /not/ being compared against.  I'm just posting this here as an
> FYI...no solution has been found. :(
>

Squid uses basic regex without extensions - the basic operators that
work in both GNU regex and POSIX regex can be expected to work.

Your mistake is thinking that URL always looks like "https://example.com/".

For HTTPS traffic going through an HTTP proxy the URL is in
authority-form which looks like "example.com:443".
<https://tools.ietf.org/html/rfc7230#section-5.3.3>


>
> On 7/22/20 7:22 PM, David A. Gershman wrote:
>> Hello,
>>
>> I have the following in my config file:
>>
>>     acl user_allowed url_regex ^https://example\.com/
>>
>> but surfing to that site fails (authentication works fine).  My
>> ultimate goal is to have an RE comparable to the PCRE of:
>>
>>     ^https?:\/\/.*?example\.com\/
>>
>> While the PCRE works just fine in other tools (my own scripts, online,
>> etc.), I was unable to get it to work within Squid3.  As I stripped
>> away pieces of the RE in the config file, the only RE which seemed to
>> work was:
>>
>>     example\.com
>>
>> ...not even having the ending '/'.  However, this obviously does not
>> meet my needs.
>>

To get to the scheme and path information for HTTPS traffic you need
SSL-Bump functionality built into the proxy and configured to decrypt
the TLS traffic layer.

OpenSSL license currently (soon to change, yay!) does not permit Debian
to distribute a Squid binary package with that feature enabled so you
will have to rebuild the squid package yourself with relevant additions
or install a package from an independent repository.



>> I'm on Debian 10 and am unable to determine which RE library Debian
>> compiled Squid3 against (I've got a Tweet out to them to see if they
>> can point me in the right direction).

Squid3 has been removed from Debian long ago. You should be using
"squid" package these days which is Squid-4 on all current Debian.


HTH
Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Simple REGEX not working...

David A. Gershman
Thank Amos.  Ironically I just found that out with testing and then a search pointing me here:

    https://wiki.squid-cache.org/Features/HTTPS

Sadly, I should have thought of that.  Been a long day I guess.

Thanks again!

--David

On 7/22/20 8:58 PM, Amos Jeffries wrote:
On 23/07/20 3:27 pm, David A. Gershman wrote:
Hello again,

After further testing, the looks like the only thing being regex'd
against is the domain name.  I shrunk the RE down to just:

    acl user_allowed url_regex http  # nothing more, just 'http'

and it /*still*/ failed!!!  It's as if the "whole url" (claimed by the
docs) is /not/ being compared against.  I'm just posting this here as an
FYI...no solution has been found. :(

Squid uses basic regex without extensions - the basic operators that
work in both GNU regex and POSIX regex can be expected to work.

Your mistake is thinking that URL always looks like "https://example.com/".

For HTTPS traffic going through an HTTP proxy the URL is in
authority-form which looks like "example.com:443".
<https://tools.ietf.org/html/rfc7230#section-5.3.3>


On 7/22/20 7:22 PM, David A. Gershman wrote:
Hello,

I have the following in my config file:

    acl user_allowed url_regex ^<a class="moz-txt-link-freetext" href="https://example\.com/">https://example\.com/

but surfing to that site fails (authentication works fine).  My
ultimate goal is to have an RE comparable to the PCRE of:

    ^https?:\/\/.*?example\.com\/

While the PCRE works just fine in other tools (my own scripts, online,
etc.), I was unable to get it to work within Squid3.  As I stripped
away pieces of the RE in the config file, the only RE which seemed to
work was:

    example\.com

...not even having the ending '/'.  However, this obviously does not
meet my needs.

To get to the scheme and path information for HTTPS traffic you need
SSL-Bump functionality built into the proxy and configured to decrypt
the TLS traffic layer.

OpenSSL license currently (soon to change, yay!) does not permit Debian
to distribute a Squid binary package with that feature enabled so you
will have to rebuild the squid package yourself with relevant additions
or install a package from an independent repository.



I'm on Debian 10 and am unable to determine which RE library Debian
compiled Squid3 against (I've got a Tweet out to them to see if they
can point me in the right direction).
Squid3 has been removed from Debian long ago. You should be using
"squid" package these days which is Squid-4 on all current Debian.


HTH
Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users