Re: HTTPS cache for Java application - only getting TCP_MISS

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: HTTPS cache for Java application - only getting TCP_MISS

Antony Stone
On Wednesday 13 June 2018 at 21:28:27, baretomas wrote:

> Hello,
>
> I'm setting up a Squid proxy as a cache for a number (as many as possible)
> of identical JAVA applications to run their web calls through.

> The problem is that none of the calls get cached: All rows in the
> access.log hava a TCP_MISS/200 tag in them.
>
> I've searched all through the web for a solution to this, and have tried
> everything people have suggested. So I was hoping someone could help me?

Show us the response you get (at least the full headers, content is neither
here nor there) from the remote server.

My bet is that the website manager has used one or more "don't cache"
directives which Squid is simply faithfully obeying.


Antony.

--
Please apologise my errors, since I have a very small device.

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: HTTPS cache for Java application - only getting TCP_MISS

Antony Stone
On Wednesday 13 June 2018 at 21:28:27, baretomas wrote:

> The calls from the application is done using ssl / https by telling java to
> use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost).

Okay, but...

> http_port 3128 ssl-bump generate-host-certificates=on
> dynamic_cert_mem_cache_size=4MB
> cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> key=/cygdrive/c/squid/etc/squid/proxyCA.pem

> # certificate generation program
> sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
> /cygdrive/c/squid/var/cache/squid_ssldb -M 4MB

> acl step1 at_step SslBump1
>
> ssl_bump peek step1
> ssl_bump bump all

Surely all this peeking and bumping is only needed if you're running Squid in
interception mode, whereas you've said that you've configured your Java
application to explicitly use Squid as a proxy?


Have you tried your Squid configuration with a plain browser, configured to use
the proxy, with (a) a few random websites, and (b) the specific resource you're
trying to access from your Java application, to see whether it is actually
working as a caching proxy?


Antony.

--
This sentence contains exacly three erors.

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: HTTPS cache for Java application - only getting TCP_MISS

Antony Stone
In reply to this post by Antony Stone
On Thursday 14 June 2018 at 09:09:05, Tomas Finn√ły wrote:

> > Surely all this peeking and bumping is only needed if you're running
> > Squid in interception mode, whereas you've said that you've configured
> > your Java application to explicitly use Squid as a proxy?
>
> I found some "how-to's" and posts that were explaining how to make a https
> cache proxy, and they were all mentioning bumping. Isn't the bump needed
> to decrypt the response, so it is possible to store it in the cache?

No, because when you explicitly configure a browser (or in your case a Java
application) to use a proxy, it sends a request to the proxy saying "please go
and fetch something from this URI for me", and Squid then does all the HTTPS
negotiations needed to talk to the remote server.  What Squid gets back is the
plain unencrypted content, which it can then pass on to the browser (or
application), and if it's allowed to (by whatever it finds in the headers of
the response) it can also cache it.

> I dont need any acl with peek and bump for my scenario at all, is what you
> are saying?

Correct.

> > Have you tried your Squid configuration with a plain browser, configured
> > to use the proxy, with (a) a few random websites, and (b) the specific
> > resource you're trying to access from your Java application, to see
> > whether it is actually working as a caching proxy?
>
> No. And something I will do now. Thanks for tips.

No problem.  Just suggesting "start simple" before moving on to several
complex things interacting with each other...

> Sorry for the messy formatting here, but I didnt get your responses to my
> mail. I only saw it in the archives and copied it over to my mail here....

Hm, odd, I see my reply on the list just as normal.


Antony.

--
I thought of going into banking, until I lost interest.

                                                   Please reply to the list;
                                                         please *don't* CC me.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: HTTPS cache for Java application - only getting TCP_MISS

Amos Jeffries
Administrator
In reply to this post by Antony Stone
On 14/06/18 07:28, baretomas wrote:

> Hello,
>
> I'm setting up a Squid proxy as a cache for a number (as many as possible)
> of identical JAVA applications to run their web calls through. The calls are
> ofc identical, and the response they get can safely be cached for 5-10
> seconds.
> I do this because most of the calls is directed at a single server on the
> internet that I don't want to hammer, since I will ofc be locked out of it
> then.
>
> Currently Im simply testing this on a single computer: the application and
> squid
>
> The calls from the application is done using ssl / https by telling java to
> use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up
> squid and JAVA with self-signed certificates, and the application sends its
> calls through squid and gets the reponse. No problem there (wasnt easy that
> either I must say :P ).

I was going to ask what was so hard about it. Then I looked at your
config and see that your are in fact using NAT interception instead of
the easy way.

So what _exactly_ do those -D options cause the Java applications to do
with the proxy?
 I have some suspicions, but am not familiar enough with Java API and
the specific details are critical to what you need the proxy to be doing.


>
> The problem is that none of the calls get cached: All rows in the access.log
> hava a TCP_MISS/200 tag in them.
>
> I've searched all through the web for a solution to this, and have tried
> everything people have suggested. So I was hoping someone could help me?
>
> Anyone have any tips on what to try?
>

There are three ways to do this:

1) if you own the domain the apps are connecting to. Setup the proxy as
a normal TLS / HTTPS reverse-proxy.

2) if you have enough control of the apps to get them connecting with
TLS *to the proxy* and sending their requests there. Do that.

3) the (relatively) complicated SSL-Bump way you found. The proxy is
fully at the mercy of the the messages sent by apps and servers. Caching
is a luxury here, easily broken / prevented.

Well, there is a forth way with intercept. But that is a VERY last
resort and you already have (3) going and that is already better than
intercept. Getting to (1) or (2) would be simplest if you meet the "if
..." requirements for those.



> MY config (note Ive set the refresh_pattern like that just to see if I could
> catch anything. The plan is to modify it so it actualyl does refresh the
> responses frmo the web calls in 5-10 seconds intervals. There are commented
> out pats Ive tried with no luck there too):
>
...

Ah. The way you write that implies a misunderstanding about refresh_pattern.

HTTP has some fixed algorithms written into the protocol that caches are
required to perform to determine if any object stored can be used or
requires replacement.

The parameters used by these algorithms come in the form of headers in
the originally stored reply message, the current clients request.
Sometimes they require revalidation, which is a quick check with the
server for updated instructions and/or content.

What refresh_pattern actually does is provide default values for those
algorithm parameters IF any one (or more) of them are missing from those
HTTP messages.


The proper way to make caching happen with your desired behaviour is for
the server to present HTTP Cache-Control header saying the object is
cacheable (ie does not forbid caching), but not for more than 10seconds.
 Cache-Control: max-age=10
OR to say that objects need revalidation, but presents a 304 status for
revalidation checks. (ie Cache-Control:no-cache)  (yeah, thats right,
"no-cache" means *do* cache).

That said, I doubt you really are wanting to force that and would be
happy if the server was instructing the the proxy as being safe to cache
an object for several minutes or any value larger than 10sec.


So what we circle back to is that you are probably trying to force
things to cache and be used long past their actual safe-to-use lifetimes
as specified by the devs most authoritative on that subject (under
10sec?). As you should be aware, this is highly unsafe thing to be doing
unless you are one of those devs - be very careful what you choose to do.


>
>
> # Squid normally listens to port 3128
> #http_port 3128 ssl-bump generate-host-certificates=on
> dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem
> key=/cygdrive/c/squid/etc/squid/ssl/myca.key
>
> http_port 3128 ssl-bump generate-host-certificates=on
> dynamic_cert_mem_cache_size=4MB
> cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> key=/cygdrive/c/squid/etc/squid/proxyCA.pem
>
> #https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> key=/cygdrive/c/squid/etc/squid/proxyCA.pem
>

Hmm. This is a Windows machine running Cygwin?
FYI: Performance is going to be terrible. It may not be super relevant
yet. Just be aware that Windows imposes limitations on usable sockets
per application - which is much smaller than a typical proxy requires.
The Cygwin people do a lot but they cannot solve some OS limitation
problems.

To meet your very first sentence "as many as possible" requirement you
will need a non-Windows machine to run the proxy on. That simple change
will get you something around 3 orders of magnitude higher peak client
capacity on the proxy.


>
> # Uncomment the line below to enable disk caching - path format is
> /cygdrive/<full path to cache folder>, i.e.
> #cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256
>
> # certificate generation program
> sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
> /cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
>
> # Leave coredumps in the first cache dir
> coredump_dir /var/cache/squid
>
> # Add any of your own refresh_pattern entries above these.
> #refresh_pattern ^ftp: 1440 20% 10080
> #refresh_pattern ^gopher: 1440 0% 1440
> #refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> #refresh_pattern -i (/cgi-bin/|\?) 1440 100% 4320 ignore-no-store
> override-lastmod override-expire ignore-must-revalidate ignore-reload
> ignore-private ignore-auth
> refresh_pattern . 1440 100% 4320 ignore-no-store override-lastmod
> override-expire ignore-must-revalidate ignore-reload ignore-private
> ignore-auth override-lastmod
>


* ignore-must-revalidate actively *reduces* caching. Because it disables
several of the widely used HTTP mechanisms that rely on revalidation to
allow things to be stored in a cache.
 It is *only* beneficial if the server is broken; requiring revalidation
plus not supporting revalidation.


* ignore-auth same un-intuitive effects as ignoring revalidation, again
reducing caching ability.
 This is only useful if you want to prevent caching of contents which
require any form of login to view. High security networks dealing with
classified or confidential materials find this useful - regular Internet
admin not so much.


* ignore-no-store is highly dangerous and rarely necessary. The "nuclear
option" for caching. It has the potential to eradicate user privacy and
scramble up any server personalized content (not in a good way).
 This is a last resort intended only to copy with severely braindead
applications. YMMV whether you have to deal with any of those - just
treat this an absolute last resort rather than something to play with.


Overall - in order to use these refresh-pattern controls you *need* to
know what the HTTP(S) messages going through your proxy contain in terms
of caching headers AND what those messages are doing semantically /
content wise for the client application. Using any of them as a generic
"makes caching better" thing only leads to problems in todays HTTP protocol.


> # Bumped requests have relative URLs so Squid has to use reverse proxy
> # or accelerator code. By default, that code denies direct forwarding.
> # The need for this option may disappear in the future.
> #always_direct allow all
>
> dns_nameservers 8.8.8.8 208.67.222.222

Use of 8.8.8.8 is known to be explicitly detrimental to caching
intercepted traffic.

Those servers present different result sets based on the timing and IP
sending the query. The #1 requirement of caching intercepted (or
SSL-Bump'ed) content is that the client and proxy have the exact same
view of DNS system contents. Having the DNS reply contents change
between two consecutive and identical queries breaks that requirement.


>
> max_filedescriptors 3200
>
> # Max Object Size Cache
> maximum_object_size 10240 KB
>
>
> acl step1 at_step SslBump1
>
> ssl_bump peek step1
> ssl_bump bump all

This causes the proxy to attempt decryption of the traffic using crypto
algorithms based solely on the ClientHello details and its own
capabilities. There is zero server crypto capabilities known for the
proxy to use to ensure traffic can actually make it to the server.

You are rather lucky that it actually worked at all. Almost any
deviation (ie emergency security updates in future) at either client or
server or proxy endpoints risks breaking the communication through this
proxy.

Ideally there would be a stare action for step2 and them bump only at
step 3.




So in summary to the things to try to get better caching:

* ditch 8.8.8.8. Use a local DNS resolver within your own network,
shared by clients and proxy. That can use 8.8.8.8 itself, the important
part is that it should be responsible for caching DNS results and
ensuring the app clients and Squid see as much the same records as possible.

* try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for
message being decrypted in the proxy. Look at those headers to see why
they are not caching normally. Use that info to inform your next
actions. It cannot tell you how the message is used by the application,
hopefully you can figure that out somehow before forcing anything unnatural.

* if you can, try pasting some of the transaction URLs into the tool at
redbot.org to see if there are any HTTP level mistakes in the apps that
could be fixed for better cacheability.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: HTTPS cache for Java application - only getting TCP_MISS

Amos Jeffries
Administrator
In reply to this post by Antony Stone
On 14/06/18 07:44, Antony Stone wrote:

> On Wednesday 13 June 2018 at 21:28:27, baretomas wrote:
>
>> The calls from the application is done using ssl / https by telling java to
>> use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost).
>
> Okay, but...
>
>> http_port 3128 ssl-bump generate-host-certificates=on
>> dynamic_cert_mem_cache_size=4MB
>> cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
>> key=/cygdrive/c/squid/etc/squid/proxyCA.pem
>
>> # certificate generation program
>> sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
>> /cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
>
>> acl step1 at_step SslBump1
>>
>> ssl_bump peek step1
>> ssl_bump bump all
>
> Surely all this peeking and bumping is only needed if you're running Squid in
> interception mode,

Not quite. SSL-Bump is interception of the TLS layer. Regular / forward
/ explicit proxies use it to decrypt the CONNECT messages transporting
HTTPS traffic through tunnels.


> whereas you've said that you've configured your Java
> application to explicitly use Squid as a proxy?
>

The proxy port and SSL-Bump config is consistent with a SSL-Bumping
forward proxy.

I suspect the -Dhttp.proxyHost is probably the Java apps equivalent to
the Linux http_proxy environment variables we are more familiar with
seeing applications use to connect to that type of proxy.

>
> Have you tried your Squid configuration with a plain browser, configured to use
> the proxy, with (a) a few random websites, and (b) the specific resource you're
> trying to access from your Java application, to see whether it is actually
> working as a caching proxy?
>

Good idea.


Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: HTTPS cache for Java application - only getting TCP_MISS

Alex Rousskov
In reply to this post by Amos Jeffries
On 06/14/2018 01:32 PM, baretomas wrote:

> On 14 June 2018 1:25 PM, Amos Jeffries <[hidden email]> wrote:
>> 2.  if you have enough control of the apps to get them connecting with
>>     TLS to the proxy and sending their requests there. Do that.

You are not doing this if your Squid receives CONNECT requests. If you
can get your apps to do the right thing, then Squid would be receiving
GET requests (and such) with https:// URLs instead of CONNECT requests.


>> 3.  the (relatively) complicated SSL-Bump way you found. The proxy is
>>     fully at the mercy of the the messages sent by apps and servers.

You are doing this right now. Some Java magic encrypts your app requests
and sends encrypted requests through Squid via CONNECT tunnels. You bump
those encrypted tunnels to get to the HTTP requests and cache responses.

Alex.


> According to the java docs, the https_proxy (-Dhttps.proxyHost and
> -Dhttps.proxyPort should redirect all ssl traffic to that destination.)
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users