Squid regex grammar

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Squid regex grammar

Yuri Voinov
Just for clarify (it is not well-documented. At least I can't find any
documentation about):

Squid's regex supports only POSIX Basic grammar?

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Amos Jeffries
Administrator
On 27/10/17 13:06, Yuri wrote:
> Just for clarify (it is not well-documented. At least I can't find any
> documentation about):
>
> Squid's regex supports only POSIX Basic grammar?
>

The specific grammar depends on your regex library used to build Squid,
so YMMV.

Basic POSIX is the only portable grammar that *all* regex libraries can
be expected to support. So Squid does not officially support other
grammars (yet) even if they work in your particular build.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov


27.10.2017 12:01, Amos Jeffries пишет:

> On 27/10/17 13:06, Yuri wrote:
>> Just for clarify (it is not well-documented. At least I can't find any
>> documentation about):
>>
>> Squid's regex supports only POSIX Basic grammar?
>>
>
> The specific grammar depends on your regex library used to build
> Squid, so YMMV.
>
> Basic POSIX is the only portable grammar that *all* regex libraries
> can be expected to support. So Squid does not officially support other
> grammars (yet) even if they work in your particular build.
That's why I'm asking that the POSIX Extended in the Squid does not
work. And it is not well documented anywhere. And there is no easy way
to check what regex library Squid uses.

I'm trying to find it in configuration, found that:
root @ cthulhu /patch/squid-5.0.0-patched-v2.26 # ./configure
--help|grep regex
  --enable-gnuregex       Compile GNUregex. Unless you have reason to
use this
                          Unix boxes which do not have their own regex
library

Then see ldd:

root @ cthulhu /patch/squid-5.0.0-patched-v2.26 # ldd
/usr/local/squid/sbin/squid        libpthread.so.1 =>      
/lib/64/libpthread.so.1
        libnettle.so.6 =>        /opt/csw/lib/amd64/libnettle.so.6
        libmd5.so.1 =>   /lib/64/libmd5.so.1
        libecap.so.3 =>  /usr/local/lib/libecap.so.3
        libatomic.so.1 =>        /opt/csw/lib/amd64/libatomic.so.1
        libssl.so.1.0.0 =>       /opt/csw/lib/amd64/libssl.so.1.0.0
        libcrypto.so.1.0.0 =>    /opt/csw/lib/amd64/libcrypto.so.1.0.0
        libkrb5.so.1 =>  /usr/lib/64/libkrb5.so.1
        libstdc++.so.6 =>        /opt/csw/lib/amd64/libstdc++.so.6
        libsocket.so.1 =>        /lib/64/libsocket.so.1
        libresolv.so.2 =>        /lib/64/libresolv.so.2
        libnsl.so.1 =>   /lib/64/libnsl.so.1
        libltdl.so.7 =>  /opt/csw/lib/amd64/libltdl.so.7
        libm.so.2 =>     /lib/64/libm.so.2
        librt.so.1 =>    /lib/64/librt.so.1
        libgcc_s.so.1 =>         /opt/csw/lib/amd64/libgcc_s.so.1
        libc.so.1 =>     /lib/64/libc.so.1
        libmp.so.2 =>    /lib/64/libmp.so.2
        libmd.so.1 =>    /lib/64/libmd.so.1
        libscf.so.1 =>   /lib/64/libscf.so.1
        libaio.so.1 =>   /lib/64/libaio.so.1
        libdoor.so.1 =>  /lib/64/libdoor.so.1
        libuutil.so.1 =>         /lib/64/libuutil.so.1
        libgen.so.1 =>   /lib/64/libgen.so.1
        mech_krb5.so.1 =>        /usr/lib/64/gss/mech_krb5.so.1
        libgss.so.1 =>   /usr/lib/64/libgss.so.1
        libpkcs11.so.1 =>        /usr/lib/64/libpkcs11.so.1
        libcmd.so.1 =>   /lib/64/libcmd.so.1
        libcryptoutil.so.1 =>    /usr/lib/64/libcryptoutil.so.1

From this output, you can not determine the regular expression library
that is being used. Although maybe I'm just not looking there.

Experimentally, I was able to find out that the grammar of POSIX
Extended does not work in any case.

However, I believe that such things should be well documented, otherwise
the regular expression is simply silently ignored and it is extremely
difficult to detect.
>
> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Amos Jeffries
Administrator
On 28/10/17 02:59, Yuri wrote:

>
>
> 27.10.2017 12:01, Amos Jeffries пишет:
>> On 27/10/17 13:06, Yuri wrote:
>>> Just for clarify (it is not well-documented. At least I can't find any
>>> documentation about):
>>>
>>> Squid's regex supports only POSIX Basic grammar?
>>>
>>
>> The specific grammar depends on your regex library used to build
>> Squid, so YMMV.
>>
>> Basic POSIX is the only portable grammar that *all* regex libraries
>> can be expected to support. So Squid does not officially support other
>> grammars (yet) even if they work in your particular build.
> That's why I'm asking that the POSIX Extended in the Squid does not
> work. And it is not well documented anywhere. And there is no easy way
> to check what regex library Squid uses.
>
> I'm trying to find it in configuration, found that:
> root @ cthulhu /patch/squid-5.0.0-patched-v2.26 # ./configure
> --help|grep regex
>    --enable-gnuregex       Compile GNUregex. Unless you have reason to
> use this
>                            Unix boxes which do not have their own regex
> library
>

The full text there is:
  "
   --enable-gnuregex

   Compile GNUregex. Unless you have reason to use this
   option, you should not enable it. This library file
   is usually only required on Windows and very old
   Unix boxes which do not have their own regex library
   built in.
"

If you *dont* override the local environment by setting that build
option Squid uses whatever your build tools link to with "-lregex".

> Then see ldd:
>
> root @ cthulhu /patch/squid-5.0.0-patched-v2.26 # ldd
> /usr/local/squid/sbin/squid        libpthread.so.1 =>
> /lib/64/libpthread.so.1
>          libnettle.so.6 =>        /opt/csw/lib/amd64/libnettle.so.6
>          libmd5.so.1 =>   /lib/64/libmd5.so.1
>          libecap.so.3 =>  /usr/local/lib/libecap.so.3
>          libatomic.so.1 =>        /opt/csw/lib/amd64/libatomic.so.1
>          libssl.so.1.0.0 =>       /opt/csw/lib/amd64/libssl.so.1.0.0
>          libcrypto.so.1.0.0 =>    /opt/csw/lib/amd64/libcrypto.so.1.0.0
>          libkrb5.so.1 =>  /usr/lib/64/libkrb5.so.1
>          libstdc++.so.6 =>        /opt/csw/lib/amd64/libstdc++.so.6
>          libsocket.so.1 =>        /lib/64/libsocket.so.1
>          libresolv.so.2 =>        /lib/64/libresolv.so.2
>          libnsl.so.1 =>   /lib/64/libnsl.so.1
>          libltdl.so.7 =>  /opt/csw/lib/amd64/libltdl.so.7
>          libm.so.2 =>     /lib/64/libm.so.2
>          librt.so.1 =>    /lib/64/librt.so.1
>          libgcc_s.so.1 =>         /opt/csw/lib/amd64/libgcc_s.so.1
>          libc.so.1 =>     /lib/64/libc.so.1
>          libmp.so.2 =>    /lib/64/libmp.so.2
>          libmd.so.1 =>    /lib/64/libmd.so.1
>          libscf.so.1 =>   /lib/64/libscf.so.1
>          libaio.so.1 =>   /lib/64/libaio.so.1
>          libdoor.so.1 =>  /lib/64/libdoor.so.1
>          libuutil.so.1 =>         /lib/64/libuutil.so.1
>          libgen.so.1 =>   /lib/64/libgen.so.1
>          mech_krb5.so.1 =>        /usr/lib/64/gss/mech_krb5.so.1
>          libgss.so.1 =>   /usr/lib/64/libgss.so.1
>          libpkcs11.so.1 =>        /usr/lib/64/libpkcs11.so.1
>          libcmd.so.1 =>   /lib/64/libcmd.so.1
>          libcryptoutil.so.1 =>    /usr/lib/64/libcryptoutil.so.1
>
>  From this output, you can not determine the regular expression library
> that is being used. Although maybe I'm just not looking there.
>

I believe the -lregex ABI is presented by libstdc++ nowdays since regex
was made part of the C++11 standard library. So quite difficult to see.

OR, if that Squid was built with the GNUregex setting it will show up in
"squid -v" output rather than the ldd dependency list.


> Experimentally, I was able to find out that the grammar of POSIX
> Extended does not work in any case.
>
> However, I believe that such things should be well documented, otherwise
> the regular expression is simply silently ignored and it is extremely
> difficult to detect.

That sounds like a library problem. If Squid receives a regex error code
from the library when compiling any regex from your squid.conf it logs
the relevant error to cache.log.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov


27.10.2017 20:32, Amos Jeffries пишет:

> On 28/10/17 02:59, Yuri wrote:
>>
>>
>> 27.10.2017 12:01, Amos Jeffries пишет:
>>> On 27/10/17 13:06, Yuri wrote:
>>>> Just for clarify (it is not well-documented. At least I can't find any
>>>> documentation about):
>>>>
>>>> Squid's regex supports only POSIX Basic grammar?
>>>>
>>>
>>> The specific grammar depends on your regex library used to build
>>> Squid, so YMMV.
>>>
>>> Basic POSIX is the only portable grammar that *all* regex libraries
>>> can be expected to support. So Squid does not officially support other
>>> grammars (yet) even if they work in your particular build.
>> That's why I'm asking that the POSIX Extended in the Squid does not
>> work. And it is not well documented anywhere. And there is no easy way
>> to check what regex library Squid uses.
>>
>> I'm trying to find it in configuration, found that:
>> root @ cthulhu /patch/squid-5.0.0-patched-v2.26 # ./configure
>> --help|grep regex
>>    --enable-gnuregex       Compile GNUregex. Unless you have reason to
>> use this
>>                            Unix boxes which do not have their own regex
>> library
>>
>
> The full text there is:
>  "
>   --enable-gnuregex
>
>   Compile GNUregex. Unless you have reason to use this
>   option, you should not enable it. This library file
>   is usually only required on Windows and very old
>   Unix boxes which do not have their own regex library
>   built in.
> "
>
> If you *dont* override the local environment by setting that build
> option Squid uses whatever your build tools link to with "-lregex".
>
>> Then see ldd:
>>
>> root @ cthulhu /patch/squid-5.0.0-patched-v2.26 # ldd
>> /usr/local/squid/sbin/squid        libpthread.so.1 =>
>> /lib/64/libpthread.so.1
>>          libnettle.so.6 =>        /opt/csw/lib/amd64/libnettle.so.6
>>          libmd5.so.1 =>   /lib/64/libmd5.so.1
>>          libecap.so.3 =>  /usr/local/lib/libecap.so.3
>>          libatomic.so.1 =>        /opt/csw/lib/amd64/libatomic.so.1
>>          libssl.so.1.0.0 =>       /opt/csw/lib/amd64/libssl.so.1.0.0
>>          libcrypto.so.1.0.0 =>    /opt/csw/lib/amd64/libcrypto.so.1.0.0
>>          libkrb5.so.1 =>  /usr/lib/64/libkrb5.so.1
>>          libstdc++.so.6 =>        /opt/csw/lib/amd64/libstdc++.so.6
>>          libsocket.so.1 =>        /lib/64/libsocket.so.1
>>          libresolv.so.2 =>        /lib/64/libresolv.so.2
>>          libnsl.so.1 =>   /lib/64/libnsl.so.1
>>          libltdl.so.7 =>  /opt/csw/lib/amd64/libltdl.so.7
>>          libm.so.2 =>     /lib/64/libm.so.2
>>          librt.so.1 =>    /lib/64/librt.so.1
>>          libgcc_s.so.1 =>         /opt/csw/lib/amd64/libgcc_s.so.1
>>          libc.so.1 =>     /lib/64/libc.so.1
>>          libmp.so.2 =>    /lib/64/libmp.so.2
>>          libmd.so.1 =>    /lib/64/libmd.so.1
>>          libscf.so.1 =>   /lib/64/libscf.so.1
>>          libaio.so.1 =>   /lib/64/libaio.so.1
>>          libdoor.so.1 =>  /lib/64/libdoor.so.1
>>          libuutil.so.1 =>         /lib/64/libuutil.so.1
>>          libgen.so.1 =>   /lib/64/libgen.so.1
>>          mech_krb5.so.1 =>        /usr/lib/64/gss/mech_krb5.so.1
>>          libgss.so.1 =>   /usr/lib/64/libgss.so.1
>>          libpkcs11.so.1 =>        /usr/lib/64/libpkcs11.so.1
>>          libcmd.so.1 =>   /lib/64/libcmd.so.1
>>          libcryptoutil.so.1 =>    /usr/lib/64/libcryptoutil.so.1
>>
>>  From this output, you can not determine the regular expression library
>> that is being used. Although maybe I'm just not looking there.
>>
>
> I believe the -lregex ABI is presented by libstdc++ nowdays since
> regex was made part of the C++11 standard library. So quite difficult
> to see.
Does not fit, Amos. The standard C ++ library for regular expressions
uses the ECMAScript syntax by default. AFAIK.

But acl's regexes behaviour demonstrate POSIX Basic behaviour. This is
simple to check:
\w and \d metacharacters does not work in regex acl's.

>
> OR, if that Squid was built with the GNUregex setting it will show up
> in "squid -v" output rather than the ldd dependency list.
>
>
>> Experimentally, I was able to find out that the grammar of POSIX
>> Extended does not work in any case.
>>
>> However, I believe that such things should be well documented, otherwise
>> the regular expression is simply silently ignored and it is extremely
>> difficult to detect.
>
> That sounds like a library problem. If Squid receives a regex error
> code from the library when compiling any regex from your squid.conf it
> logs the relevant error to cache.log.
Don't think so. Because of same behaviour demostrates on another server
with another platform (OS) with similar squid's
configuration/version/configs. And I see not any regex error in cache.log.

I want to clarify. I asked the question not because there are some
errors. And because the regular expressions in the ECMAScript syntax do
not work in the ACL. Without any errors. Just simple ignores acl parts
with ECMAS grammar constructions.
>
> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Alex Rousskov
In reply to this post by Amos Jeffries
On 10/27/2017 08:32 AM, Amos Jeffries wrote:
> On 28/10/17 02:59, Yuri wrote:
>> the regular expression is simply silently ignored and it is extremely
>> difficult to detect.

> That sounds like a library problem. If Squid receives a regex error code
> from the library when compiling any regex from your squid.conf it logs
> the relevant error to cache.log.

When a regular expression is using extended features, the basic regular
expression compiler often (or even always?!) does not fail because it
views the extended features as ordinary plain characters. Thus, Squid
cannot tell that something went wrong.

I cannot give a Squid-based example quickly, but here is a related
illustration using grep (which is not exactly the same as what happens
inside Squid, but I suspect it is similar enough for the illustration
purposes in this context):

> $ echo "foobar" | grep --basic-regexp    'foo|bar'
> $ echo "foobar" | grep --extended-regexp 'foo|bar'
> foobar

As you can see, the basic compiler is silent about the "|" character
that it does not support. Here is a similar example where a malformed
extended regular expression is silently accepted by the basic compiler:


> $ echo "foobar" | grep --basic-regexp 'foo(bar'
> $ echo "foobar" | grep --extended-regexp 'foo(bar'
> grep: Unmatched ( or \(


In theory, Squid itself could detect special characters unsupported by
the current regex library but doing so correctly without breaking many
existing working configurations may be impossible. On the other hand,
this validation could become an optional feature that admins can control.

The best strategy for a Squid admin working with complex regex ACLs may
be to add external test cases that validate ACL matching expectations,
but doing so requires significant amount of work and discipline.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov


27.10.2017 20:55, Alex Rousskov пишет:

> On 10/27/2017 08:32 AM, Amos Jeffries wrote:
>> On 28/10/17 02:59, Yuri wrote:
>>> the regular expression is simply silently ignored and it is extremely
>>> difficult to detect.
>> That sounds like a library problem. If Squid receives a regex error code
>> from the library when compiling any regex from your squid.conf it logs
>> the relevant error to cache.log.
> When a regular expression is using extended features, the basic regular
> expression compiler often (or even always?!) does not fail because it
> views the extended features as ordinary plain characters. Thus, Squid
> cannot tell that something went wrong.
>
> I cannot give a Squid-based example quickly, but here is a related
> illustration using grep (which is not exactly the same as what happens
> inside Squid, but I suspect it is similar enough for the illustration
> purposes in this context):
>
>> $ echo "foobar" | grep --basic-regexp    'foo|bar'
>> $ echo "foobar" | grep --extended-regexp 'foo|bar'
>> foobar
> As you can see, the basic compiler is silent about the "|" character
> that it does not support. Here is a similar example where a malformed
> extended regular expression is silently accepted by the basic compiler:
>
>
>> $ echo "foobar" | grep --basic-regexp 'foo(bar'
>> $ echo "foobar" | grep --extended-regexp 'foo(bar'
>> grep: Unmatched ( or \(
>
> In theory, Squid itself could detect special characters unsupported by
> the current regex library but doing so correctly without breaking many
> existing working configurations may be impossible. On the other hand,
> this validation could become an optional feature that admins can control.
>
> The best strategy for a Squid admin working with complex regex ACLs may
> be to add external test cases that validate ACL matching expectations,
> but doing so requires significant amount of work and discipline.
That's what I'm talking about. Just when it comes to hundreds and
thousands of regular expressions - this approach seems not too
acceptable. Therefore, I would like to see that the grammars used are
clearly documented. Squid with a simple configuration check often does
not show anything (if there are no obvious errors - i.e. incomplete
regex or similar) and, in a productive configuration, it is extremely
difficult to detect a non-working access control list parts. The
websites are also thousands.

Therefore, I would like either a clear documentation or some tool for
checking whether the regular expression is correct from the point of
view of the current library used by Squid or not. The existing
opportunities seem completely unsatisfactory.
>
> Alex.
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Antony Stone
On Friday 27 October 2017 at 17:06:01, Yuri wrote:

> 27.10.2017 20:55, Alex Rousskov пишет:
> >
> > When a regular expression is using extended features, the basic regular
> > expression compiler often (or even always?!) does not fail because it
> > views the extended features as ordinary plain characters. Thus, Squid
> > cannot tell that something went wrong.

> >> $ echo "foobar" | grep --basic-regexp    'foo|bar'
> >> $ echo "foobar" | grep --extended-regexp 'foo|bar'
> >> foobar
> >
> > As you can see, the basic compiler is silent about the "|" character
> > that it does not support. Here is a similar example where a malformed
> >
> > extended regular expression is silently accepted by the basic compiler:
> >> $ echo "foobar" | grep --basic-regexp 'foo(bar'
> >> $ echo "foobar" | grep --extended-regexp 'foo(bar'
> >> grep: Unmatched ( or \(

> I would like either a clear documentation

That sounds entirely reasonable - a statement something like "Squid is
guaranteed to use basic POSIX grammar, but extended grammar may be available
on different systems; the sysadmin should check"?

> or some tool for checking whether the regular expression is correct from the
> point of view of the current library used by Squid or not.

What does "correct" mean?

As Alex's examples above demonstrate, both are "correct" regexes from the
basic POSIX point of view; they just don't do what the admin might have wanted
or expected.

How could Squid know whether you expect ( in a regex to be a literal character
or a meta-character?

> The existing opportunities seem completely unsatisfactory.

Nothing documents that Squid uses other than basic POSIX grammar, so why would
you assume that it does?


Antony.

--
It is also possible that putting the birds in a laboratory setting
inadvertently renders them relatively incompetent.

 - Daniel C Dennett

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov


27.10.2017 21:17, Antony Stone пишет:

> On Friday 27 October 2017 at 17:06:01, Yuri wrote:
>
>> 27.10.2017 20:55, Alex Rousskov пишет:
>>> When a regular expression is using extended features, the basic regular
>>> expression compiler often (or even always?!) does not fail because it
>>> views the extended features as ordinary plain characters. Thus, Squid
>>> cannot tell that something went wrong.
>>>> $ echo "foobar" | grep --basic-regexp    'foo|bar'
>>>> $ echo "foobar" | grep --extended-regexp 'foo|bar'
>>>> foobar
>>> As you can see, the basic compiler is silent about the "|" character
>>> that it does not support. Here is a similar example where a malformed
>>>
>>> extended regular expression is silently accepted by the basic compiler:
>>>> $ echo "foobar" | grep --basic-regexp 'foo(bar'
>>>> $ echo "foobar" | grep --extended-regexp 'foo(bar'
>>>> grep: Unmatched ( or \(
>> I would like either a clear documentation
> That sounds entirely reasonable - a statement something like "Squid is
> guaranteed to use basic POSIX grammar, but extended grammar may be available
> on different systems; the sysadmin should check"?
>
>> or some tool for checking whether the regular expression is correct from the
>> point of view of the current library used by Squid or not.
> What does "correct" mean?
"correct" mean "this will correctly works in Squid, not silently
ignored". This is simple and obvious, isn't it?
>
> As Alex's examples above demonstrate, both are "correct" regexes from the
> basic POSIX point of view; they just don't do what the admin might have wanted
> or expected.
>
> How could Squid know whether you expect ( in a regex to be a literal character
> or a meta-character?
I expect following known documented behaviour.

And not a casket with a surprise, which should be investigated in each
specific configuration. Adherence to standards provides interoperability
- a familiar word?
>
>> The existing opportunities seem completely unsatisfactory.
> Nothing documents that Squid uses other than basic POSIX grammar, so why would
> you assume that it does?
Antonio, the problem is that this too is not documented. Maybe someone
will work hard to clearly describe the behavior in the documentation?
Because I did not find, as I said, a direct mention of the default
grammar. Do I clearly express my thoughts?
>
>
> Antony.
>
I asked a simple question. And wanted a simple answer. And not
reasoning, what can be, and what can not. Interoperability is a simple
thing.

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Antony Stone
On Friday 27 October 2017 at 17:26:18, Yuri wrote:

> 27.10.2017 21:17, Antony Stone пишет:
> > On Friday 27 October 2017 at 17:06:01, Yuri wrote:
> >> 27.10.2017 20:55, Alex Rousskov пишет:
> >>> When a regular expression is using extended features, the basic regular
> >>> expression compiler often (or even always?!) does not fail because it
> >>> views the extended features as ordinary plain characters. Thus, Squid
> >>> cannot tell that something went wrong.
> >>>
> >>>> $ echo "foobar" | grep --basic-regexp    'foo|bar'
> >>>> $ echo "foobar" | grep --extended-regexp 'foo|bar'
> >>>> foobar
> >>>
> >>> As you can see, the basic compiler is silent about the "|" character
> >>> that it does not support. Here is a similar example where a malformed
> >>>
> >>> extended regular expression is silently accepted by the basic compiler:
> >>>> $ echo "foobar" | grep --basic-regexp 'foo(bar'
> >>>> $ echo "foobar" | grep --extended-regexp 'foo(bar'
> >>>> grep: Unmatched ( or \(
> >>
> >> I would like either a clear documentation
> >
> > That sounds entirely reasonable - a statement something like "Squid is
> > guaranteed to use basic POSIX grammar, but extended grammar may be
> > available on different systems; the sysadmin should check"?
> >
> >> or some tool for checking whether the regular expression is correct from
> >> the point of view of the current library used by Squid or not.
> >
> > What does "correct" mean?
>
> "correct" mean "this will correctly works in Squid, not silently
> ignored". This is simple and obvious, isn't it?

No.

Suppose I write a | character (as per Alex's first example above) in my regex.

Basic POSIX will match that literally.

Extended grep will not.

Judging purely from what is written in my regex, did I mean the character to
be matched literally, or not?

Squid cannot tell.

> Adherence to standards provides interoperability - a familiar word?

Indeed.

> I asked a simple question. And wanted a simple answer.

Maybe there isn't one.

> And not reasoning, what can be, and what can not.

Then I apologise for trying to explain.

> Interoperability is a simple thing.

Er, no, it isn't.


Antony.

--
If the human brain were so simple that we could understand it,
we'd be so simple that we couldn't.

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov


27.10.2017 21:33, Antony Stone пишет:

> On Friday 27 October 2017 at 17:26:18, Yuri wrote:
>
>> 27.10.2017 21:17, Antony Stone пишет:
>>> On Friday 27 October 2017 at 17:06:01, Yuri wrote:
>>>> 27.10.2017 20:55, Alex Rousskov пишет:
>>>>> When a regular expression is using extended features, the basic regular
>>>>> expression compiler often (or even always?!) does not fail because it
>>>>> views the extended features as ordinary plain characters. Thus, Squid
>>>>> cannot tell that something went wrong.
>>>>>
>>>>>> $ echo "foobar" | grep --basic-regexp    'foo|bar'
>>>>>> $ echo "foobar" | grep --extended-regexp 'foo|bar'
>>>>>> foobar
>>>>> As you can see, the basic compiler is silent about the "|" character
>>>>> that it does not support. Here is a similar example where a malformed
>>>>>
>>>>> extended regular expression is silently accepted by the basic compiler:
>>>>>> $ echo "foobar" | grep --basic-regexp 'foo(bar'
>>>>>> $ echo "foobar" | grep --extended-regexp 'foo(bar'
>>>>>> grep: Unmatched ( or \(
>>>> I would like either a clear documentation
>>> That sounds entirely reasonable - a statement something like "Squid is
>>> guaranteed to use basic POSIX grammar, but extended grammar may be
>>> available on different systems; the sysadmin should check"?
>>>
>>>> or some tool for checking whether the regular expression is correct from
>>>> the point of view of the current library used by Squid or not.
>>> What does "correct" mean?
>> "correct" mean "this will correctly works in Squid, not silently
>> ignored". This is simple and obvious, isn't it?
> No.
>
> Suppose I write a | character (as per Alex's first example above) in my regex.
>
> Basic POSIX will match that literally.
>
> Extended grep will not.
>
> Judging purely from what is written in my regex, did I mean the character to
> be matched literally, or not?
>
> Squid cannot tell.
Yes. You now understanding root case. If we're say - "Squid uses POSIX
Basic until _admin_ specify 'POSIX Extended' in config option" - we're
can expecting POSIX Basic behaviour and only it. Agree? But point is:
we're don't know and can't know, what library functionality exists and
what will work or will not.

So, in each separate case we're should make testcase for EACH regex in
acl to make sure it will or not will work.

Generally speaking, with thousands of regular expressions and thousands
of sites - it sounds pretty dumb, right? Many to many relasions,
thousands tests etc.
>
>> Adherence to standards provides interoperability - a familiar word?
> Indeed.
>
>> I asked a simple question. And wanted a simple answer.
> Maybe there isn't one.
Noooooooo.

What could be simpler is to clearly document the following: "Never use
anything other than POSIX Basic in regular expressions because we do not
guarantee and can not guarantee it will work"?
>
>> And not reasoning, what can be, and what can not.
> Then I apologise for trying to explain.
Yes, I understand everything, Anthony. It's easier to unsubscribe -
"Test every regular expression yourself."
>
>> Interoperability is a simple thing.
> Er, no, it isn't.
Simple. You just have to follow standards and standard *documented*
behavior. As soon as rabbid's dances begin with self-made
interpretations of the standard, problems begin.
>
>
> Antony.
>

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov
In reply to this post by Antony Stone
27.10.2017 21:33, Antony Stone пишет:

> On Friday 27 October 2017 at 17:26:18, Yuri wrote:
>
>> 27.10.2017 21:17, Antony Stone пишет:
>>> On Friday 27 October 2017 at 17:06:01, Yuri wrote:
>>>> 27.10.2017 20:55, Alex Rousskov пишет:
>>>>> When a regular expression is using extended features, the basic regular
>>>>> expression compiler often (or even always?!) does not fail because it
>>>>> views the extended features as ordinary plain characters. Thus, Squid
>>>>> cannot tell that something went wrong.
>>>>>
>>>>>> $ echo "foobar" | grep --basic-regexp    'foo|bar'
>>>>>> $ echo "foobar" | grep --extended-regexp 'foo|bar'
>>>>>> foobar
>>>>> As you can see, the basic compiler is silent about the "|" character
>>>>> that it does not support. Here is a similar example where a malformed
>>>>>
>>>>> extended regular expression is silently accepted by the basic compiler:
>>>>>> $ echo "foobar" | grep --basic-regexp 'foo(bar'
>>>>>> $ echo "foobar" | grep --extended-regexp 'foo(bar'
>>>>>> grep: Unmatched ( or \(
As for me personally, I would like ECMAScript syntax to be supported in
regular expressions of access control lists by default. :)

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Alex Rousskov
In reply to this post by Yuri Voinov
On 10/27/2017 09:43 AM, Yuri wrote:

> So, in each separate case we're should make testcase for EACH regex in
> acl to make sure it will or not will work.
>
> Generally speaking, with thousands of regular expressions and thousands
> of sites - it sounds pretty dumb, right? Many to many relasions,
> thousands tests etc.

What an admin has to do is onerous, but not as bad as you make it sound:

* A handful of test cases is sufficient to validate whether Squid
instance X supports all extended regular expressions used by its ACLs.
In fact, Squid can be easily modified to run such test cases on startup!

* If you want to test that each of the 10K ACLs matches what you want it
to match, then you have to write a lot more test cases, of course, but
such deployment-specific functionality testing is a completely different
topic out of this thread scope.


> What could be simpler is to clearly document the following: "Never use
> anything other than POSIX Basic in regular expressions because we do not
> guarantee and can not guarantee it will work"?

You are right: Adding the above text to squid.cond.documented is fairly
simple. If Squid actually supports extended regular expressions in some
environments, then such a text will also be a bit misleading/misguiding.
Wiki updates or pull requests improving Squid documentation are always
welcomed!

Personally, I cannot volunteer to add this documentation because I do
not know whether Squid can support extended regular expressions in some
environments, and I do not want to spend time adding potentially
misleading/misguiding documentation.

Long-term, we should introduce a configuration option that specifies the
exact regex flavor an admin wants _and_ forces Squid to quit if that
exact flavor is not supported by the running Squid instance.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Alex Rousskov
In reply to this post by Yuri Voinov
On 10/27/2017 09:52 AM, Yuri wrote:

> As for me personally, I would like ECMAScript syntax to be supported in
> regular expressions of access control lists by default. :)

I think it is pointless to argue whether regex flavor X should be
supported. Once the necessary infrastructure is in place, the cost of
adding support for one more popular flavor is negligible compared to the
benefits it offers. The admin should be able to select the regex flavor
they want (even if they want a flavor that cannot look behind or match
an arbitrary character with a dot :-).

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Squid regex grammar

Yuri Voinov
In reply to this post by Alex Rousskov


27.10.2017 22:01, Alex Rousskov пишет:

> On 10/27/2017 09:43 AM, Yuri wrote:
>
>> So, in each separate case we're should make testcase for EACH regex in
>> acl to make sure it will or not will work.
>>
>> Generally speaking, with thousands of regular expressions and thousands
>> of sites - it sounds pretty dumb, right? Many to many relasions,
>> thousands tests etc.
> What an admin has to do is onerous, but not as bad as you make it sound:
>
> * A handful of test cases is sufficient to validate whether Squid
> instance X supports all extended regular expressions used by its ACLs.
> In fact, Squid can be easily modified to run such test cases on startup!
>
> * If you want to test that each of the 10K ACLs matches what you want it
> to match, then you have to write a lot more test cases, of course, but
> such deployment-specific functionality testing is a completely different
> topic out of this thread scope.
>
>
>> What could be simpler is to clearly document the following: "Never use
>> anything other than POSIX Basic in regular expressions because we do not
>> guarantee and can not guarantee it will work"?
> You are right: Adding the above text to squid.cond.documented is fairly
> simple. If Squid actually supports extended regular expressions in some
> environments, then such a text will also be a bit misleading/misguiding.
> Wiki updates or pull requests improving Squid documentation are always
> welcomed!
Yes. In some cases/environments POSIX Extended works and this introduces
problems when we're trying to populate good working config from one
environment to another (especially when it slightly different).
>
> Personally, I cannot volunteer to add this documentation because I do
> not know whether Squid can support extended regular expressions in some
> environments, and I do not want to spend time adding potentially
> misleading/misguiding documentation.
>
> Long-term, we should introduce a configuration option that specifies the
> exact regex flavor an admin wants _and_ forces Squid to quit if that
> exact flavor is not supported by the running Squid instance.
Or put warning in cache.log. This will be ideally.
>
> Alex.
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

--
**************************
* C++: Bug to the future *
**************************


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

0x3E3743A7.asc (2K) Download Attachment
signature.asc (673 bytes) Download Attachment