[HTTPS-Everywhere] [Technologists] what percentage of https everywhere rules are "simple" upgrades?

Mon Jan 5 10:52:22 PST 2015

On Mon, Jan 05, 2015 at 10:00:47AM -0800, yan wrote:
> Hi all, the w3c web application security list has been discussing
> "optimistic" upgrades for mixed content (essentially, changing all mixed
> content resources on a page from HTTP to HTTPS in hopes that the HTTPS
> version is semantically equivalent to the HTTP page). See
> http://lists.w3.org/Archives/Public/public-webappsec/2015Jan/0004.html.

Thanks for flagging this!

> 
> As Brad mentioned, it would be useful to get HTTPS Everywhere data on
> which percentage of HTTPS-E rules are "simple" (ex: http://example.com
> to https://example.com) vs not (ex: http://example.com to
> https://www.example.com).

We have Python code to test rulesets; the question then is how to
produce a test population of URLs to sample with.  Is it a handful of
synthetic URLs per rule, or a population of real-world URLs?

Yahoo probably has the latter :)
> 
> Unfortunately this doesn't show whether the HTTPS version
> exists-but-is-semantically-different-from-HTTP or doesn't exist at all.
> An example of the former is http://forbes.com vs https://forbes.com.
> 
> Thoughts?

We should definitely wade into this thread; I'm strongly of the view
that MCB is making HTTPS harder, and browsers should make the
opportunistic attempts even if they occasionally cause issues.  I might
be missing some horrible security flaw, but I can't think of it right
now.

> 
> -Yan
> 
> 
> 
> -------- Forwarded Message --------
> Subject: 	Re: [MIX] Require HTTPS scripts to be able to anything HTTP
> scripts can do.
> Date: 	Mon, 5 Jan 2015 05:24:27 -0500
> From: 	Tim Berners-Lee <timbl at w3.org>
> To: 	Brad Hill <hillbrad at gmail.com>
> CC: 	public-webappsec at w3.org <public-webappsec at w3.org>
> 
> 
> 
> 
> On 2015-01 -02, at 16:14, Brad Hill <hillbrad at gmail.com
> <mailto:hillbrad at gmail.com>> wrote:
> 
> > Tim,
> >  
> > Thanks for chiming in.  If I may endeavor to clarify a couple of things:
> >
> >
> >     I would like to introduce a requirement:
> >
> >     *No script which works when served as http: should fail when
> >     instead served from https:*
> >
> >
> > There is a distinction here that the phrasing of this requirement
> > misses - between the behavior of a secure document context and the
> > location from which a script is served.
> 
> You are right of course.    I guess the requirement needs to be
> rewritten more like:
> 
> >     *No script which works when served as http: within an http: web
> >     page should fail when instead served from https: within a https:
> >     web page*
> 
> 
> I was assuming that all of the original web app (html, js, css, images)
> was coming from the same server.
> 
> >
> > In the strict sense, this requirement as phrased above is already
> > correct and would not be changed by the proposed Mixed Content draft.
> >  e.g. If I load the insecure resource: 
> >
> > http://example.com/foo.html
> >
> > and it contains the following markup:
> >
> > <script src=http://example.com/scripts/doStuff.js>
> >
> > that link can ALWAYS be changed to:
> >
> > <script src=https://example.com/scripts/doStuff.js>
> >
> > and nothing will break.
> 
> When you at the same time put that within https://example.com/pages/p1.html
> instead of http://example.com/pages/p1.html then things break.
> 
> >
> > Also, if I load the secure resource:
> >
> > https://example.com/foo.html
> >
> > and it contains the following markup:
> >
> > <script src=http://example.com/scripts/doStuff.js>
> >
> > That script today will ALWAYS be blocked from loading by modern
> > browsers, for some years now. 
> >
> > So any script which today can be served over http and be allowed to
> > execute can be upgraded to https and continue to work.
> 
> But only within insecure web apps.   Yes, it is the secure outer context
> which is important. outer context which is relevant.
> 
> > The rest of your message seems to indicate you're not actually or
> > exclusively concerned with where/how a script is served, but with
> > whether secure document contexts can load insecure content.
> >
> > There are many legitimate reasons to be concerned about introducing
> > insecure content into a page that has made some assertions to a user
> > about its security, as Michal has already pointed out in his reply.
> >
> > If I might suggest a pivot here along the lines of the compatibility
> > and path forward you (and we all) desire, perhaps we ought to discuss
> > the possibility of automatic / optimistic upgrade from HTTP -> HTTPS
> > for requests issued from a secure document context.  So if you load a
> > script over https from a secure context, just auto-fixup all links to
> > https.
> 
> 
> It is an interesting idea to so automatic / optimistic upgrade when you
> are accessing a legacy http: link from within a secure context.  I
> suspect people pushed back on automatic / optimistic upgrade earlier in
> general as it isn't as good, on the web in general, as leaving a
> specific https: pointer for someone to follow.  But that was a general
> case, while in this specific case it may be that the benefits outweigh
> the concerns.
> 
> More broadly:  The fact that something is referred to with a http: URL
> should not
> 
> This won't solve the problem of loading data from open data servers from
> a secure app.
> Like apps which access  the linked open data cloud http://lod-cloud.net/
> which is a mass of http: links, not all maintained.
> 
> > We have always shied away from doing this in the past because there is
> > no formal guarantee that the resource at an URL with the http scheme
> > is semantically equivalent to one available at the "same" URL with the
> > https scheme.
> 
> That is a massive question.  That was the core concern of my earlier
> (August) posting to www-tag.
> Message: http://lists.w3.org/Archives/Public/www-tag/2014Aug/0056.html
> Thread:
> http://lists.w3.org/Archives/Public/www-tag/2014Aug/thread.html#msg56
> 
> I think  in fact we have to grasp it by the horns and say when client
> code libraries can assume that the two are the same and when they are not.
> 
> Part of solution may be for example to say that if you serve a  HSTS
> header, when you are making the commitment that they are semantically
> equivalent for the entire site.   Client code mades that assumption once
> it sees the header.   RDF systems canonicalize the URLs, smushing
> together the data they have on the thing identified with or without the
> s as the same thing.
> 
> There are of course existing sites with completely randomly different
> content under https and http.
> Google will I assume know hoe many.   We may have to barricade those off
> as legacy oddities.
> 
> 
> 
> >
> > Perhaps that shyness is worth revisiting today in light of the broad
> > push to move as much as possible to secure transports.  If resource
> > authors simply started serving the same content over https as they do
> > today over http, we could make vast improvements and avoid much of the
> > pain mixed-content blocking creates for such transitions today.
> >
> 
> The encryption of the link is easy to upgrade too -- one could just
> include it in the next apache, node, etc.
> Establishing the identity of the server is the rathole.
> 
> > The edge cases introduced by this kind of optimistic upgrade may very
> > well be fewer and less harmful than those introduced by allowing
> > insecure content into secure contexts.  In fact, the EFF probably
> > already has a good amount of data on exactly this from the HTTPS
> > Everywhere extension.
> >
> > What do you think?
> >
> > -Brad Hill
> > (chair, webappsec)
> >
> >
> >
> >
> >
> 
> 
> 
> 
> _______________________________________________
> Technologists mailing list
> Technologists at mail.eff.org
> https://mail.eff.org/mailman/listinfo/technologists
> 

-- 
Peter Eckersley                            pde at eff.org
Technology Projects Director      Tel  +1 415 436 9333 x131
Electronic Frontier Foundation    Fax  +1 415 436 9993