[HTTPS-Everywhere] HTTPS Everywhere for Chromium

Tue Nov 30 18:47:35 PST 2010

Adam, I still haven't had as much time as I would like to go over this, but
cursorily it looks very promising.  

As you noticed, the two most complicated cases seem to be Wikipedia and Google
Search.  We now have 350 rulesets, but I don't think any of the others exhibit
genuinely complex rewriting behaviour.

I think we could handle Wikipedia under your proposal by enumerating all of
the langugaes wikipedia supports.  Although that might be several thousand
entries in your rule format if one supported
wik(ipedia|inews|isource|ibooks|iquote|iversity|tionary) for each language :(.
Would parsing so many rules be a load-time performance problem?  

Can SPDY or something else fix the the Google search case?

On Fri, Nov 26, 2010 at 11:55:34AM -0500, Adam Langley wrote:
> At the moment I don't believe that any of the experimental Chromium
> extension APIs[1] are sufficient to support HTTPS Everywhere and that
> there aren't any other plans to support it in Chromium.
> 
> In order to support HTTPS Everywhere in Chromium we have to live with
> some limitations which don't affect Firefox:
> 
> 1) Chromium will not let extensions excessively slow down normal
> execution. That means that it won't perform a round-trip to an
> extension process for each URL request in order to allow the extension
> to manipulate the request.
> 
> 2) Serialising and re-parsing URLs is very scary from a security point
> of view. It would be greatly preferable to handle URLs in their
> processed form.
> 
> 3) I'm not going to put a regexp engine into the Chromium networking
> stack for complexity reasons.
> 
> Based on this, I believe that it would be possible to support a
> limited form of request rewriting by extensions. The rewrite rules
> would have to be loaded into the networking stack, operate on parsed
> URLs and be much more limited than full regexps. As a short
> proof-of-concept I have attached a toy program which can process XML
> rules into such a format. Here's an example of the output:
> 
> // Scroogle http://(www\.)?scroogle\.org/cgi-bin/nbbw\.cgi$
> https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi
> {
>   "match": {
>     "scheme": "http",
>     "domain": "www.scroogle.org",
>     "path": "/cgi-bin/nbbw.cgi",
>   },
>   "substitute": {
>     "scheme": "https",
>     "domain": "ssl.scroogle.org",
>     "matchedPath": "/cgi-bin/nbbwssl.cgi",
>   },
> },
> // Scroogle http://(www\.)?scroogle\.org/cgi-bin/nbbw\.cgi$
> https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi
> {
>   "match": {
>     "scheme": "http",
>     "domain": "scroogle.org",
>     "path": "/cgi-bin/nbbw.cgi",
>   },
>   "substitute": {
>     "scheme": "https",
>     "domain": "ssl.scroogle.org",
>     "matchedPath": "/cgi-bin/nbbwssl.cgi",
>   },
> },
> 
> For the given regexp, the code generates all possible matching inputs
> (i.e. with, and without the "www."), applies the substitution to each
> and diffs the input and output to generate a rule.
> 
> Many rules can be expressed in this simple form. Some exceptions:
> 
> 1) Anything dealing with query parameters for now
> 2) Rules which cannot be reasonably enumerated:
> 
> Skipping http://www\.google\.com/cse/intl/([^/:@][^/:@])/images/google_custom_search_watermark\.gif$
> due to remaining repeats or charsets
> 
> (although the case of "[^@:/]*" in the domain is specifically handled.)
> 
> 3) Trying to substitute parts of the domain into the path (in the
> "Wikipedia" ruleset)
> 
> There are also some cases which I believe are mistakes in the current
> rules. For example, some rules attempt to remap HTTPS urls:
> 
> 2010/11/26 11:34:08 Rule 'https?://(de-de|www\.)?facebook\.de/' in
> 'Facebook' maps non-HTTP urls (e.g. https://de-defacebook.de/)
> 2010/11/26 11:34:08 Rule 'https?://de-de\.facebook\.com/' in
> 'Facebook' maps non-HTTP urls (e.g. https://de-de.facebook.com/)
> 2010/11/26 11:34:08 Rule 'https?://(fr-fr|www\.)?facebook\.fr/' in
> 'Facebook' maps non-HTTP urls (e.g. https://fr-frfacebook.fr/)
> 2010/11/26 11:34:08 Rule 'https?://fr-fr\.facebook\.com/' in
> 'Facebook' maps non-HTTP urls (e.g. https://fr-fr.facebook.com/)
> 2010/11/26 11:34:08 Rule 'https?://finance\.google\.com/' in
> 'GoogleServices' maps non-HTTP urls (e.g. https://finance.google.com/)
> 2010/11/26 11:34:08 Rule 'https?://finance\.google\.co\.uk/' in
> 'GoogleServices' maps non-HTTP urls (e.g.
> https://finance.google.co.uk/)
> 
> (Note also that the Facebook rules appear to also be missing a period
> after, say, "de-de". Since I'm not sure if these are considered to be
> errors I haven't attached a patch to fix them, but can do if
> requested.)
> 
> Comments? I can't guarantee that Chromium would actually implement
> this: the extensions folks may have other ideas.
> 
> 
> AGL
> 
> [1] http://code.google.com/chrome/extensions/api_index.html#experimental
> 
> -- 
> Adam Langley agl at imperialviolet.org http://www.imperialviolet.org

> _______________________________________________
> HTTPS-everywhere mailing list
> HTTPS-everywhere at mail1.eff.org
> https://mail1.eff.org/mailman/listinfo/https-everywhere

-- 
Peter Eckersley                            pde at eff.org
Senior Staff Technologist         Tel  +1 415 436 9333 x131
Electronic Frontier Foundation    Fax  +1 415 436 9993