W3C Extented Log Format

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

W3C Extented Log Format

Caleb Anthony
Hello all,

I'm trying to use the logformat setting in Squid 2.6 to log in the W3C
Extended log format. More specifically, we are trying to emulate W3C
Extended under IIS. (
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/676400bc-8969-4aa7-851a-9319490a9bbb.mspx?mfr=true)

Everything is working great, especially the patch I found here, which
added the ability to get the cs-uri-stem field:
http://www.squid-cache.org/Versions/v2/HEAD/changesets/11444.patch

That patch was a major part in getting this to work, but the only part
that is missing is the cs-uri-query field. There doesn't seem to be
any way to get just this information.

I looked at the source in access_log.c, and it seems that there was a
plan to add this functionality, but it was abandoned because of
strip_query_terms:

Line 277 access_log.c
/*LFT_REQUEST_QUERY, * // * this is not needed. see strip_query_terms */

Going off of the patch above, it appears simple enough to add this
functionality, however I'm not familiar enough with the Squid code
base to know where to look.

In the above patch, it was just a matter of implementing
LFT_REQUEST_URLPATH with "out = strBuf(al->request->urlpath);".

Does something like al->request->query exist? Or would this be a
little harder to implement?

Also, here is what I have so far to get W3C logging in Squid in case
anybody else needs this log format:

date   %{%Y-%m-%d}tg
time   %{%X}tg
c-ip   %>a
cs-username  %ul
s-ip   %la
s-port   %lp
cs-method  %rm
cs-uri-stem  %rp
cs-uri-query  -
sc-status  %Hs
sc-bytes  %<st
cs-bytes  %>st
time-taken  %tr
cs-version  HTTP/%rv
cs-host   %{Host}>h
cs(User-Agent)  %{User-Agent}>h
cs(Cookie)  %{Cookie}>h
cs(Referrer)  %{Referer}>h

One last thing, I had to use sed on the log files to convert %20's in
the User-Agent and Cookie header fields into +'s for it to really look
like IIS logs:

sed 's/%20/+/g'

Any help or direction is appreciated.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: W3C Extented Log Format

Henrik Nordström
On tor, 2007-09-06 at 11:08 -0600, Caleb Anthony wrote:

> Does something like al->request->query exist? Or would this be a
> little harder to implement?

Squid not being a web server and do not parse the URL into path and
querystring separately, instead the query string is parsed as part of
the requested path. Which means that if your grab request->path then you
have everything behind the host including query.

strip_query_terms off simply cuts the logged URL after the ?

> One last thing, I had to use sed on the log files to convert %20's in
> the User-Agent and Cookie header fields into +'s for it to really look
> like IIS logs:
>
> sed 's/%20/+/g'

Right. Should add a format method for this..

Regards
Henrik

signature.asc (316 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: W3C Extented Log Format

Henrik Nordström
On sön, 2007-09-09 at 11:35 +0200, Henrik Nordstrom wrote:

> On tor, 2007-09-06 at 11:08 -0600, Caleb Anthony wrote:
>
> > Does something like al->request->query exist? Or would this be a
> > little harder to implement?
>
> Squid not being a web server and do not parse the URL into path and
> querystring separately, instead the query string is parsed as part of
> the requested path. Which means that if your grab request->path then you
> have everything behind the host including query.
>
> strip_query_terms off simply cuts the logged URL after the ?
strip_query_terms on I meant..

>
> > One last thing, I had to use sed on the log files to convert %20's in
> > the User-Agent and Cookie header fields into +'s for it to really look
> > like IIS logs:
> >
> > sed 's/%20/+/g'
>
> Right. Should add a format method for this..
>
> Regards
> Henrik

signature.asc (316 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: W3C Extented Log Format

Caleb Anthony
Ok, thanks for your help.

On 9/9/07, Henrik Nordstrom <[hidden email]> wrote:

> On sön, 2007-09-09 at 11:35 +0200, Henrik Nordstrom wrote:
> > On tor, 2007-09-06 at 11:08 -0600, Caleb Anthony wrote:
> >
> > > Does something like al->request->query exist? Or would this be a
> > > little harder to implement?
> >
> > Squid not being a web server and do not parse the URL into path and
> > querystring separately, instead the query string is parsed as part of
> > the requested path. Which means that if your grab request->path then you
> > have everything behind the host including query.
> >
> > strip_query_terms off simply cuts the logged URL after the ?
> strip_query_terms on I meant..
>
> >
> > > One last thing, I had to use sed on the log files to convert %20's in
> > > the User-Agent and Cookie header fields into +'s for it to really look
> > > like IIS logs:
> > >
> > > sed 's/%20/+/g'
> >
> > Right. Should add a format method for this..
> >
> > Regards
> > Henrik
>
>