[HTTPS-Everywhere] Automatically generated rules

Jacob Hoffman-Andrews jsha at eff.org
Thu Mar 12 10:23:38 PDT 2015


Hi Dominik! Thanks for working on HTTPS Everywhere for your master
thesis.

Right now my first priority for HTTPS Everywhere is to make sure it
works at very high quality and doesn't break websites in any
significant way. Towards that end I've been integrating automated
ruleset checking written by another community member:
https://github.com/hiviah/https-everywhere-checker. Right now we
have approximately 14.5k rulesets. In February I disabled about 3k
of them that failed some of the automated tests.

Given the push towards quality, I definitely can't incorporate your
large set of auto-generated rules. But if you are interested in
modifying your tool to do deeper fetches beyond the first page, and
using it to detect errors in our existing rulesets, that would be
quite valuable.

You might also be interested to take a look at this recent project,
from another HTTPS Everywhere contributor:
https://github.com/guiweber/ssl-yeah

On 03/12/2015 02:19 AM, Dominik Frühwirt wrote:
> Hi,
> 
> I'm currently finishing my master thesis in computer science which
> addresses the ruleset of HTTPS-E. Simply put, I try to generate rules
> for a large amount of websites automatically. Therefore, an automated
> browser (PhantomJS) has been utilized for fetching the HTML source of
> the HTTP websites and corresponding HTTPS websites. These are found by
> trying to reach the most frequently used subdomains of HTTPS secured
> domains. The retrieved sources are compared by using different
> similarity matching algorithms and treated as positive match when the
> calculated similarity value exceeds a certain threshold.
> 
> I took Alexa's top million websites as an input and generated about
> 89,000 rules out of it. Since only the landing pages are compared,
> particularities like resources that are available on a certain path via
> HTTP but not via HTTPS are not considered (no exclusion patterns).
> Generally, the generated rules are not as accurate as the community
> written ones. Hence, manually created rules should not be overruled when
> merging the generated rules into the current ruleset.
> 
> One problem resulting from such a large ruleset is the browser's UI.
> When I select "Enable / Disable Rules" in the menu of the extension
> Firefox stops working and freezes completely probably because it tries
> to load the whole ruleset into the dialog's list. This problem should be
> solved before including a large set of rules.
> 
> Are you interested in incorporating the generated rules into the public
> ruleset?
> 
> Kind Regards
> Dominik Frühwirt
> _______________________________________________
> HTTPS-Everywhere mailing list
> HTTPS-Everywhere at lists.eff.org
> https://lists.eff.org/mailman/listinfo/https-everywhere
> 


More information about the HTTPS-Everywhere mailing list