Hi squid users
Is there any way to change the request url log format for HTTPS messages?
I am using %ru to pull out the URL. When we get https connections, we see the url logged as www.microsoft.com:443
is there any way to reformat the log message to remove the appended port? or to go further and rewrite to use https://<url>?
Thanks in advance
On 5/04/2017 6:00 p.m., daveh wrote:
> Hi squid users
> Is there any way to change the request url log format for HTTPS messages?
> I am using %ru to pull out the URL. When we get https connections, we see
> the url logged as www.microsoft.com:443
You are assumping that URI means HTTPS. It may seem reasonable, but is
The CONNECT request is a _tunnel_ request. It is an opaque *TCP* tunnel.
There is no guarantee that any given port-443 tunnel request is actually
HTTPS these days. There is WebSockets, SPDY, HTTP/2, and a number of
custom protocols inside TLS, and non-TLS protocols as well all using the
When HTTPS does go through a port-443 tunnel, there is often more than
one HTTPS request. So writing https://blah/ to the log would be a lie,
and a deceptive one at that.
> is there any way to reformat the log message to remove the appended port?
Well, the log %ru code is intended to record the *actual* details being
received. What you are seeing is what actually exists in the traffic.
It is a URI type called "authority-form".
There is no protocol scheme, no path, no query and no fragment portions
for Squid to work with.
> to go further and rewrite to use https://<url>?
You can always define a log format that prints out the pieces of the URI
as separate format components "%>rs://%>rd:%>rP%>rp"
However, you will need to do that for a separate log to other traffic
and as mentioned above keep in mind that port-443 does not necessarily
To actually log https:// URL requires either passing Squid https:// URLs
instead of CONNECT request, or decrypting the traffic (with SSL-Bump
feature) and see what is inside the TLS (if it is TLS, it may not be).
Squid will then log the appropriate https:// URL for each received or
decrypted HTTPS request, no changes necessary.
PS: If you are asking this because of some tool that is doing broken
things when passed real URIs (not URL ... *URI*) that tool needs to be
squid-users mailing list
Thanks for the reply.
Im parsing squid logs to send to a SIEM to identify IOCs. The SIEM agent requires a URL to be formatted with http|https://<URI>
It knows then that it can break the string out into various components such as request URL authority, host etc
Your comment on logging https connections is not what I have found. I would expect that typing https://something.net will return that extact string in the log. Every https connection is logged as a CONNECT with the FQDN appended the :443. Is there something in the config to force this to happen? DOesnt seem to be a way of doing it with log formatting
Im simply rewriting to strip the 443 port and prepending https://. Doesn't matter to me if CONNECT != HTTPS I simply need my url to be properly formed in the logs
On 10/04/2017 1:36 p.m., daveh wrote:
> Thanks for the reply.
> Im parsing squid logs to send to a SIEM to identify IOCs. The SIEM agent
> requires a URL to be formatted with http|https://<URI>
> It knows then that it can break the string out into various components such
> as request URL authority, host etc
So it can understand *URL* format. But that is not what is being logged.
Squid technically logs a URI, and this log processing is one of the
cases were the difference between URI and URL matters.
> Your comment on logging https connections is not what I have found. I would
I think you misread what I wrote. There are only two ways to get Squid
to know what the https:// URL was - neither of them are normal proxy usage.
> expect that typing https://something.net will return that extact string in
> the log. Every https connection is logged as a CONNECT with the FQDN
> appended the :443.
You expect wrong.
The URL you entered into some client software starts with the schema
"https://" ... which requires that the fetching of that URL is done
securely. The last thing you should expect is that URL being sent over
plain-text / "in the clear" to some external software.
To do HTTPS the client software has to setup multiple layers of
protocols and security.
1) First it has to open a TCP connection to the proxy.
2) It does then have to tell the proxy where it is going to. But no more
than that. Thus the CONNECT request. As per
<https://tools.ietf.org/html/rfc7230#section-5.3.3> all that any
plain-text connection to a proxy contains is:
CONNECT www.example.com:443 HTTP/1.1
3) Then it has to setup TLS/SSL encryption over those two TCP
connections. So the crypto happens directly between the client and the
server (as if the proxy were not there).
4) Then, and only then, after all that has been successful does it start
to send the first (or potentially many, hundreds, thousands...) of HTTP
requests over the connection:
GET /index.html HTTP/1.1
If you look closely at that #4 layer request there is no "https://"
there. Nor any way to reconstruct it.
It might even be another CONNECT (thought TOR invented onion routing?
HTTPS beat it by decades).
That meme from The Matrix "there is no spoon" has never been more apt.
There is no "https://" - at least, not once the client interprets its
input URL. It vanishes right there and then.
> Is there something in the config to force this to happen?
There is no simple config option. In fact we go out of our way to ensure
data accuracy. So the log contains reality and log interpreters can make
whatever assumptions you want it to about what they read there.
p-PS. I find it particularly odd that you would be trying to feed false
information into a SIEM system - security event detection depends on
accuracy of inputs. But its your neck.
> DOesnt seem to be a way of doing it with log formatting
There is that logformat directive and the codes I gave in my earlier
mail. <http://www.squid-cache.org/Doc/config/logformat/> and
If the %>rs is not producing a scheme for CONNECT transactions you could
hard-code "https". Either way its a good idea to log these faked-up
records to a different log all of their own.
Use the access_log directive to setup multiple outputs:
squid-users mailing list
Thanks again for the explanation
I'm not changing the raw squid log, only the normalised event. I'm simply pulling out the url host (the FQDN) from the URL as my SIEM agent doesn't natively understand how to parse these CONNECT messages. It doesnt matter to me if CONNECT requests are not always https requests. For my purposes I need to compare the FQDN to a list of IOCs.
If I have a use case specific to the use of CONNECT requests in the future, I still have all of that information as is, from the proxy.
|Free forum by Nabble||Edit this page|