[HTTPS-Everywhere] Automatically generated rules

Dominik Frühwirt dominik.fruehwirt at gmx.at
Thu Mar 12 02:19:55 PDT 2015


Hi,

I'm currently finishing my master thesis in computer science which
addresses the ruleset of HTTPS-E. Simply put, I try to generate rules
for a large amount of websites automatically. Therefore, an automated
browser (PhantomJS) has been utilized for fetching the HTML source of
the HTTP websites and corresponding HTTPS websites. These are found by
trying to reach the most frequently used subdomains of HTTPS secured
domains. The retrieved sources are compared by using different
similarity matching algorithms and treated as positive match when the
calculated similarity value exceeds a certain threshold.

I took Alexa's top million websites as an input and generated about
89,000 rules out of it. Since only the landing pages are compared,
particularities like resources that are available on a certain path via
HTTP but not via HTTPS are not considered (no exclusion patterns).
Generally, the generated rules are not as accurate as the community
written ones. Hence, manually created rules should not be overruled when
merging the generated rules into the current ruleset.

One problem resulting from such a large ruleset is the browser's UI.
When I select "Enable / Disable Rules" in the menu of the extension
Firefox stops working and freezes completely probably because it tries
to load the whole ruleset into the dialog's list. This problem should be
solved before including a large set of rules.

Are you interested in incorporating the generated rules into the public
ruleset?

Kind Regards
Dominik Frühwirt


More information about the HTTPS-Everywhere mailing list