Content Adaptation with HTTPs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Content Adaptation with HTTPs

Christopher Ahrens
I am looking for guidance on doing Content Adaptation with https traffic
on my network to aid some accessibility systems like increasing the
contrast between text and the background (Modifying font color and
background tags, removing background images) or removing extraneous
content such a social media buttons, external javascript, removing
auto-play from audio streams (So that the audio stream does not drown
out the screen reader)

Right now I am doing this AdBlockPlus + Element Hider and GreaseMonkey.

My goal here is to essentially do the same as those tools but for the
entire network so that these changes can still be applied for devices
that do not support extensions and the like.

I looked at the ICAP services available and nothing there will work for
my purposes.

-Christopher
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Eliezer Croitoru
Hey Christopher,

For such a solution you will be required to have a content adaptation service that was designed to render the JS and other content in the page.
It's not something you would find out there just waiting for you since a lot of work is required to write such a piece of software.
The current solution makes sense.

All The Bests,
Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]



-----Original Message-----
From: squid-users [mailto:[hidden email]] On Behalf Of Christopher Ahrens
Sent: Sunday, August 20, 2017 01:37
To: [hidden email]
Subject: [squid-users] Content Adaptation with HTTPs

I am looking for guidance on doing Content Adaptation with https traffic
on my network to aid some accessibility systems like increasing the
contrast between text and the background (Modifying font color and
background tags, removing background images) or removing extraneous
content such a social media buttons, external javascript, removing
auto-play from audio streams (So that the audio stream does not drown
out the screen reader)

Right now I am doing this AdBlockPlus + Element Hider and GreaseMonkey.

My goal here is to essentially do the same as those tools but for the
entire network so that these changes can still be applied for devices
that do not support extensions and the like.

I looked at the ICAP services available and nothing there will work for
my purposes.

-Christopher
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Christopher Ahrens

The current solution doesn't work for me since it only supports a very
limited number of clients.  I am working with a charity that provides
internet services to those with impaired vision, the intention of my
project was to set up a semi-public proxy for recipient of the charity
(EG, we would install DD-WRT like routers within their homes that would
create a tunnel into our network so that they could browse the internet
using off-the-shelf systems.  We recently received a large number of
tablets form a corporate donor, the tablets themselves will work for our
recipients, but unfortunately the internet at large does not.

We've looked into commercial systems in the past, but we cannot afford
the cost of commercial systems, especially since we are unsure about the
exact licensing that would be needed for our endeavor.  We have also
been burnt in the past with commercial software where the project either
goes dead, begins to require insanely expensive appliances, or the
license price is sent sky-high.

Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to
execute this?  I figure we'd build an internal instance that will handle
the client<->proxy part, Privoxy handles the content modification, then
a second Squid instance to handle the web server<->proxy part.

SO it looks like the solution would be to find a developer to write an
ECAP to cycle through regexes to replace/remove HTML/CSS content.  So
time to dig out my old C++ books and get to work...

-Christopher

Eliezer Croitoru wrote:

> Hey Christopher,
>
> For such a solution you will be required to have a content adaptation service that was designed to render the JS and other content in the page.
> It's not something you would find out there just waiting for you since a lot of work is required to write such a piece of software.
> The current solution makes sense.
>
> All The Bests,
> Eliezer
>
> ----
> Eliezer Croitoru
> Linux System Administrator
> Mobile: +972-5-28704261
> Email: [hidden email]
>
>
>
> -----Original Message-----
> From: squid-users [mailto:[hidden email]] On Behalf Of Christopher Ahrens
> Sent: Sunday, August 20, 2017 01:37
> To: [hidden email]
> Subject: [squid-users] Content Adaptation with HTTPs
>
> I am looking for guidance on doing Content Adaptation with https traffic
> on my network to aid some accessibility systems like increasing the
> contrast between text and the background (Modifying font color and
> background tags, removing background images) or removing extraneous
> content such a social media buttons, external javascript, removing
> auto-play from audio streams (So that the audio stream does not drown
> out the screen reader)
>
> Right now I am doing this AdBlockPlus + Element Hider and GreaseMonkey.
>
> My goal here is to essentially do the same as those tools but for the
> entire network so that these changes can still be applied for devices
> that do not support extensions and the like.
>
> I looked at the ICAP services available and nothing there will work for
> my purposes.
>
> -Christopher
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>
>
>

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

babajaga
This post has NOT been accepted by the mailing list yet.
I developed a stand-alone proxy to do content adaption, but on http only. In commercial use for a couple of  years, already.
In case, of real charity org I might make a donation. However, I have been burnt by some "charity orgs" supposed to work to help refugees in EU, so I will be critical.
Content adaption of https will be a PITA, I suspect.
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Amos Jeffries
Administrator
In reply to this post by Christopher Ahrens
On 20/08/17 16:05, Christopher Ahrens wrote:

>
> The current solution doesn't work for me since it only supports a very
> limited number of clients.  I am working with a charity that provides
> internet services to those with impaired vision, the intention of my
> project was to set up a semi-public proxy for recipient of the charity
> (EG, we would install DD-WRT like routers within their homes that would
> create a tunnel into our network so that they could browse the internet
> using off-the-shelf systems.  We recently received a large number of
> tablets form a corporate donor, the tablets themselves will work for our
> recipients, but unfortunately the internet at large does not.

FYI: If you can get the adaptation part to be small enough a non-caching
Squid should be able to run on those WRT-like devices with under 32 MB
of RAM needed. So the tunnel may not be necessary, just a way to update
the software and its config.

>
> We've looked into commercial systems in the past, but we cannot afford
> the cost of commercial systems, especially since we are unsure about the
> exact licensing that would be needed for our endeavor.  We have also
> been burnt in the past with commercial software where the project either
> goes dead, begins to require insanely expensive appliances, or the
> license price is sent sky-high.
>
> Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to
> execute this?  I figure we'd build an internal instance that will handle
> the client<->proxy part, Privoxy handles the content modification, then
> a second Squid instance to handle the web server<->proxy part.

Squid will only send SSL-Bump'ed HTTPS traffic over encrypted
connections. So that is only possible if privoxy accepts TLS connections
from Squid. In which case you probably do not need the second Squid, as
privoxy would also be doing the HTTPS to-server part easily enough itself.


>
> SO it looks like the solution would be to find a developer to write an
> ECAP to cycle through regexes to replace/remove HTML/CSS content.  So
> time to dig out my old C++ books and get to work...

If the existing ICAP/eCAP options are not suitable, then yes a custom
one would be needed.

It is not as easy as a few regex replacements though. Adaptors are
streamed the full on-wire HTTP message format with only minor
sanitization by Squids parser. To alter the content you will have to
deal with data encodings, object ranges, partially received objects. And
it is best to assume everything is of infinite length unless explicitly
told otherwise - so no buffer-then-adapt code.
  eCAP is simpler than ICAP, but still has to deal with these HTTP features.

Those are a big part of why available software is so sparse. The other
part being that HTTP traffic payloads are copyright content, so there
are legal issues with selling software for the purpose of altering
copyright content sans authors permission.

Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Christopher Ahrens
Amos Jeffries wrote:

> On 20/08/17 16:05, Christopher Ahrens wrote:
>>
>> The current solution doesn't work for me since it only supports a very
>> limited number of clients.  I am working with a charity that provides
>> internet services to those with impaired vision, the intention of my
>> project was to set up a semi-public proxy for recipient of the charity
>> (EG, we would install DD-WRT like routers within their homes that
>> would create a tunnel into our network so that they could browse the
>> internet using off-the-shelf systems.  We recently received a large
>> number of tablets form a corporate donor, the tablets themselves will
>> work for our recipients, but unfortunately the internet at large does
>> not.
>
> FYI: If you can get the adaptation part to be small enough a non-caching
> Squid should be able to run on those WRT-like devices with under 32 MB
> of RAM needed. So the tunnel may not be necessary, just a way to update
> the software and its config.

Part of it is to pre-shrink the size of the pages to prevent saturating
the tunnel.  A lot of our recipients have low-cost internet connections
(Usually between 1-5 Mbps).  From my personal experiences, the
transformation are probably cutting about 75%-80% of excess garbage from
website.

We're also looking at possibly building tiny x86 or ARM-based boxes that
can be deployed to their homes to do caching to further reduce the load
on their internet connections.  The biggest complaint we have is why it
takes so long to load pictures and words especially since a lot of the
pictures are the same page-to-page (I am having a very hard time arguing
with them...)

We can get a lot of hardware from local companies, but not so much in
the way of software or services

>
>>
>> We've looked into commercial systems in the past, but we cannot afford
>> the cost of commercial systems, especially since we are unsure about
>> the exact licensing that would be needed for our endeavor.  We have
>> also been burnt in the past with commercial software where the project
>> either goes dead, begins to require insanely expensive appliances, or
>> the license price is sent sky-high.
>>
>> Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to
>> execute this?  I figure we'd build an internal instance that will
>> handle the client<->proxy part, Privoxy handles the content
>> modification, then a second Squid instance to handle the web
>> server<->proxy part.
>
> Squid will only send SSL-Bump'ed HTTPS traffic over encrypted
> connections. So that is only possible if privoxy accepts TLS connections
> from Squid. In which case you probably do not need the second Squid, as
> privoxy would also be doing the HTTPS to-server part easily enough itself.
>

Unfortunately Privoxy doesn't do HTTPs.  We looked into using it, but it
can only do domain blocking for HTTPs, not content manipulation.


>
>>
>> SO it looks like the solution would be to find a developer to write an
>> ECAP to cycle through regexes to replace/remove HTML/CSS content.  So
>> time to dig out my old C++ books and get to work...
>
> If the existing ICAP/eCAP options are not suitable, then yes a custom
> one would be needed.
>
> It is not as easy as a few regex replacements though. Adaptors are
> streamed the full on-wire HTTP message format with only minor
> sanitization by Squids parser. To alter the content you will have to
> deal with data encodings, object ranges, partially received objects. And
> it is best to assume everything is of infinite length unless explicitly
> told otherwise - so no buffer-then-adapt code.
>  eCAP is simpler than ICAP, but still has to deal with these HTTP features.
>
> Those are a big part of why available software is so sparse. The other
> part being that HTTP traffic payloads are copyright content, so there
> are legal issues with selling software for the purpose of altering
> copyright content sans authors permission.
>

Yeah, I was a bit afraid that would be the case.  I was planning on
seeing how GreaseMonkey and ABP handle data streams since they seem to
be able to handle streaming media.  Or dig into Privoxy to see how
things are done in there. Might find it to be easier to adapt it as an
ICAP/ECAP by changing its input / output functions to be ICAP/ECAP
interface rather than TCP.

For now, I'm thinking that I'll just let HTTPS pass through without
modification and let Privoxy handle http.  Seems to be the easiest way
to do things.

> Amos
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Amos Jeffries
Administrator


On 21/08/17 08:06, Christopher Ahrens wrote:

> Amos Jeffries wrote:
>> On 20/08/17 16:05, Christopher Ahrens wrote:
>>>
>>> The current solution doesn't work for me since it only supports a very
>>> limited number of clients.  I am working with a charity that provides
>>> internet services to those with impaired vision, the intention of my
>>> project was to set up a semi-public proxy for recipient of the charity
>>> (EG, we would install DD-WRT like routers within their homes that
>>> would create a tunnel into our network so that they could browse the
>>> internet using off-the-shelf systems.  We recently received a large
>>> number of tablets form a corporate donor, the tablets themselves will
>>> work for our recipients, but unfortunately the internet at large does
>>> not.
>>
>> FYI: If you can get the adaptation part to be small enough a non-caching
>> Squid should be able to run on those WRT-like devices with under 32 MB
>> of RAM needed. So the tunnel may not be necessary, just a way to update
>> the software and its config.
>
> Part of it is to pre-shrink the size of the pages to prevent saturating
> the tunnel.  A lot of our recipients have low-cost internet connections
> (Usually between 1-5 Mbps).  From my personal experiences, the
> transformation are probably cutting about 75%-80% of excess garbage from
> website.
>
> We're also looking at possibly building tiny x86 or ARM-based boxes that
> can be deployed to their homes to do caching to further reduce the load
> on their internet connections.  The biggest complaint we have is why it
> takes so long to load pictures and words especially since a lot of the
> pictures are the same page-to-page (I am having a very hard time arguing
> with them...)
>
> We can get a lot of hardware from local companies, but not so much in
> the way of software or services
>

You might be interested in the Store-ID feature then. Eliezer has done
some nice experiments with using object hashes to further reduce the
data transfer between a parent and child proxy when URL de-duplication
is not quite enough by itself.


Amos
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Alex Rousskov
In reply to this post by Christopher Ahrens
On 08/20/2017 02:06 PM, Christopher Ahrens wrote:
> I was planning on
> seeing how GreaseMonkey and ABP handle data streams since they seem to
> be able to handle streaming media.  Or dig into Privoxy to see how
> things are done in there. Might find it to be easier to adapt it as an
> ICAP/ECAP by changing its input / output functions to be ICAP/ECAP
> interface rather than TCP.

You may also be able to use Privoxy for HTTPS adaptation "as is" if you
write an eCAP or ICAP adapter that emulates both an HTTP client and an
HTTP server (and put Privoxy between them). We did HTTP emulation in
eCAP (for integration with a DPI product that could not do HTTPS) so I
am pretty sure that this is doable. However, I do not know whether it
would be easier to teach Privoxy to speak eCAP and/or ICAP instead.

Alex.
_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users
Reply | Threaded
Open this post in threaded view
|

Re: Content Adaptation with HTTPs

Eliezer Croitoru
In reply to this post by Christopher Ahrens
Hey Cristopher,

I don't know where you are working or what is your TimeZone and I think that before you jump into any adventure attacking the subject I think it would be wise to understand the nature of the issue.

With my experience as an ISP sysadmin I can tell you that many of the issues your clients\users are having might not have any connection at all to things you can imagine or dream about.
The smart way that I believe is appropriate is to verify the current state of the system\setup and then understand what is required in order to do something about the relevant issues which you can overcome\improve.
There are issues which are not related to CS but to the brilliant minds of Hollywood or commercials like Google Chrome one about speed:
https://www.youtube.com/watch?v=nCgQDjiotG0

and many others that are out there.

If you are up to the challenge of analyzing the current state of the system setup and find the right(both technical and in-budget) solution to your needs, whether if it exists already as an open source project or a ready to use product I will be happy to assist you with it.

If you are Interested to get a free consultation just send me a private email with your TZ and\or add the way you want and able to contact me.

My mobile is on the signature and I am also available at skype as: elico2013
Feel free to contact me also via telegram or whatsapp.

Eliezer

* My TZ is +3

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: [hidden email]



-----Original Message-----
From: Christopher Ahrens [mailto:[hidden email]]
Sent: Sunday, August 20, 2017 07:06
To: Eliezer Croitoru <[hidden email]>; [hidden email]
Subject: Re: [squid-users] Content Adaptation with HTTPs


The current solution doesn't work for me since it only supports a very
limited number of clients.  I am working with a charity that provides
internet services to those with impaired vision, the intention of my
project was to set up a semi-public proxy for recipient of the charity
(EG, we would install DD-WRT like routers within their homes that would
create a tunnel into our network so that they could browse the internet
using off-the-shelf systems.  We recently received a large number of
tablets form a corporate donor, the tablets themselves will work for our
recipients, but unfortunately the internet at large does not.

We've looked into commercial systems in the past, but we cannot afford
the cost of commercial systems, especially since we are unsure about the
exact licensing that would be needed for our endeavor.  We have also
been burnt in the past with commercial software where the project either
goes dead, begins to require insanely expensive appliances, or the
license price is sent sky-high.

Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to
execute this?  I figure we'd build an internal instance that will handle
the client<->proxy part, Privoxy handles the content modification, then
a second Squid instance to handle the web server<->proxy part.

SO it looks like the solution would be to find a developer to write an
ECAP to cycle through regexes to replace/remove HTML/CSS content.  So
time to dig out my old C++ books and get to work...

-Christopher

Eliezer Croitoru wrote:

> Hey Christopher,
>
> For such a solution you will be required to have a content adaptation service that was designed to render the JS and other content in the page.
> It's not something you would find out there just waiting for you since a lot of work is required to write such a piece of software.
> The current solution makes sense.
>
> All The Bests,
> Eliezer
>
> ----
> Eliezer Croitoru
> Linux System Administrator
> Mobile: +972-5-28704261
> Email: [hidden email]
>
>
>
> -----Original Message-----
> From: squid-users [mailto:[hidden email]] On Behalf Of Christopher Ahrens
> Sent: Sunday, August 20, 2017 01:37
> To: [hidden email]
> Subject: [squid-users] Content Adaptation with HTTPs
>
> I am looking for guidance on doing Content Adaptation with https traffic
> on my network to aid some accessibility systems like increasing the
> contrast between text and the background (Modifying font color and
> background tags, removing background images) or removing extraneous
> content such a social media buttons, external javascript, removing
> auto-play from audio streams (So that the audio stream does not drown
> out the screen reader)
>
> Right now I am doing this AdBlockPlus + Element Hider and GreaseMonkey.
>
> My goal here is to essentially do the same as those tools but for the
> entire network so that these changes can still be applied for devices
> that do not support extensions and the like.
>
> I looked at the ICAP services available and nothing there will work for
> my purposes.
>
> -Christopher
> _______________________________________________
> squid-users mailing list
> [hidden email]
> http://lists.squid-cache.org/listinfo/squid-users
>
>
>


_______________________________________________
squid-users mailing list
[hidden email]
http://lists.squid-cache.org/listinfo/squid-users