[HTTPS-Everywhere] what percentage of https everywhere rules are "simple" upgrades?

Mon Jan 5 10:00:47 PST 2015

Hi all, the w3c web application security list has been discussing
"optimistic" upgrades for mixed content (essentially, changing all mixed
content resources on a page from HTTP to HTTPS in hopes that the HTTPS
version is semantically equivalent to the HTTP page). See
http://lists.w3.org/Archives/Public/public-webappsec/2015Jan/0004.html.

As Brad mentioned, it would be useful to get HTTPS Everywhere data on
which percentage of HTTPS-E rules are "simple" (ex: http://example.com
to https://example.com) vs not (ex: http://example.com to
https://www.example.com).

Unfortunately this doesn't show whether the HTTPS version
exists-but-is-semantically-different-from-HTTP or doesn't exist at all.
An example of the former is http://forbes.com vs https://forbes.com.

Thoughts?

-Yan

-------- Forwarded Message --------
Subject: 	Re: [MIX] Require HTTPS scripts to be able to anything HTTP
scripts can do.
Date: 	Mon, 5 Jan 2015 05:24:27 -0500
From: 	Tim Berners-Lee <timbl at w3.org>
To: 	Brad Hill <hillbrad at gmail.com>
CC: 	public-webappsec at w3.org <public-webappsec at w3.org>

On 2015-01 -02, at 16:14, Brad Hill <hillbrad at gmail.com
<mailto:hillbrad at gmail.com>> wrote:

> Tim,
>  
> Thanks for chiming in.  If I may endeavor to clarify a couple of things:
>
>
>     I would like to introduce a requirement:
>
>     *No script which works when served as http: should fail when
>     instead served from https:*
>
>
> There is a distinction here that the phrasing of this requirement
> misses - between the behavior of a secure document context and the
> location from which a script is served.

You are right of course.    I guess the requirement needs to be
rewritten more like:

>     *No script which works when served as http: within an http: web
>     page should fail when instead served from https: within a https:
>     web page*

I was assuming that all of the original web app (html, js, css, images)
was coming from the same server.

>
> In the strict sense, this requirement as phrased above is already
> correct and would not be changed by the proposed Mixed Content draft.
>  e.g. If I load the insecure resource: 
>
> http://example.com/foo.html
>
> and it contains the following markup:
>
> <script src=http://example.com/scripts/doStuff.js>
>
> that link can ALWAYS be changed to:
>
> <script src=https://example.com/scripts/doStuff.js>
>
> and nothing will break.

When you at the same time put that within https://example.com/pages/p1.html
instead of http://example.com/pages/p1.html then things break.

>
> Also, if I load the secure resource:
>
> https://example.com/foo.html
>
> and it contains the following markup:
>
> <script src=http://example.com/scripts/doStuff.js>
>
> That script today will ALWAYS be blocked from loading by modern
> browsers, for some years now. 
>
> So any script which today can be served over http and be allowed to
> execute can be upgraded to https and continue to work.

But only within insecure web apps.   Yes, it is the secure outer context
which is important. outer context which is relevant.

> The rest of your message seems to indicate you're not actually or
> exclusively concerned with where/how a script is served, but with
> whether secure document contexts can load insecure content.
>
> There are many legitimate reasons to be concerned about introducing
> insecure content into a page that has made some assertions to a user
> about its security, as Michal has already pointed out in his reply.
>
> If I might suggest a pivot here along the lines of the compatibility
> and path forward you (and we all) desire, perhaps we ought to discuss
> the possibility of automatic / optimistic upgrade from HTTP -> HTTPS
> for requests issued from a secure document context.  So if you load a
> script over https from a secure context, just auto-fixup all links to
> https.

It is an interesting idea to so automatic / optimistic upgrade when you
are accessing a legacy http: link from within a secure context.  I
suspect people pushed back on automatic / optimistic upgrade earlier in
general as it isn't as good, on the web in general, as leaving a
specific https: pointer for someone to follow.  But that was a general
case, while in this specific case it may be that the benefits outweigh
the concerns.

More broadly:  The fact that something is referred to with a http: URL
should not

This won't solve the problem of loading data from open data servers from
a secure app.
Like apps which access  the linked open data cloud http://lod-cloud.net/
which is a mass of http: links, not all maintained.

> We have always shied away from doing this in the past because there is
> no formal guarantee that the resource at an URL with the http scheme
> is semantically equivalent to one available at the "same" URL with the
> https scheme.

That is a massive question.  That was the core concern of my earlier
(August) posting to www-tag.
Message: http://lists.w3.org/Archives/Public/www-tag/2014Aug/0056.html
Thread:
http://lists.w3.org/Archives/Public/www-tag/2014Aug/thread.html#msg56

I think  in fact we have to grasp it by the horns and say when client
code libraries can assume that the two are the same and when they are not.

Part of solution may be for example to say that if you serve a  HSTS
header, when you are making the commitment that they are semantically
equivalent for the entire site.   Client code mades that assumption once
it sees the header.   RDF systems canonicalize the URLs, smushing
together the data they have on the thing identified with or without the
s as the same thing.

There are of course existing sites with completely randomly different
content under https and http.
Google will I assume know hoe many.   We may have to barricade those off
as legacy oddities.

>
> Perhaps that shyness is worth revisiting today in light of the broad
> push to move as much as possible to secure transports.  If resource
> authors simply started serving the same content over https as they do
> today over http, we could make vast improvements and avoid much of the
> pain mixed-content blocking creates for such transitions today.
>

The encryption of the link is easy to upgrade too -- one could just
include it in the next apache, node, etc.
Establishing the identity of the server is the rathole.

> The edge cases introduced by this kind of optimistic upgrade may very
> well be fewer and less harmful than those introduced by allowing
> insecure content into secure contexts.  In fact, the EFF probably
> already has a good amount of data on exactly this from the HTTPS
> Everywhere extension.
>
> What do you think?
>
> -Brad Hill
> (chair, webappsec)
>
>
>
>
>