[HTTPS-Everywhere] Automatically generated rules

Maxim Nazarenko nz.phone at mail.ru
Thu Mar 12 11:12:01 PDT 2015


As far as I understand, the project looks like HTTPS-Finder and
https-everywhere-checker combined. Dominik, is that right? Writing
rules is more art than science, I am afraid, and therefore I share
Numismatika's concerns, but what is interesting is detecting whether
http and https versions are essentially the same. Right now
https-everywhere-checker has two different metrics, with more or less
arbitrary threshold, please correct me if I am wrong. Some statistical
data on what method is "better" and what threshold is "reasonable" may
be interesting, IMHO.

Best regards,
Maxim Nazarenko

On 12 March 2015 at 12:54, Numismatika
<numismatika-everywhere at eclipso.ch> wrote:
> I do not really trust the quality of rules that were generated without
> any human interaction.
> I think the better approach is to have something like
> https://github.com/kevinjacobs/HTTPS-Finder/.
> A setting that the addon should notify you if it detects unsecured
> content that is available over TLS .
> If you go to fix up the above mentioned addon and solve the
> shortcomings, i think we benefit more than we
> would from a few thousand rules without any quality assurance by a human
> eye.
>
> Numismatika
>
> Am 12.03.2015 um 10:19 schrieb Dominik Frühwirt:
>> Hi,
>>
>> I'm currently finishing my master thesis in computer science which
>> addresses the ruleset of HTTPS-E. Simply put, I try to generate rules
>> for a large amount of websites automatically. Therefore, an automated
>> browser (PhantomJS) has been utilized for fetching the HTML source of
>> the HTTP websites and corresponding HTTPS websites. These are found by
>> trying to reach the most frequently used subdomains of HTTPS secured
>> domains. The retrieved sources are compared by using different
>> similarity matching algorithms and treated as positive match when the
>> calculated similarity value exceeds a certain threshold.
>>
>> I took Alexa's top million websites as an input and generated about
>> 89,000 rules out of it. Since only the landing pages are compared,
>> particularities like resources that are available on a certain path via
>> HTTP but not via HTTPS are not considered (no exclusion patterns).
>> Generally, the generated rules are not as accurate as the community
>> written ones. Hence, manually created rules should not be overruled when
>> merging the generated rules into the current ruleset.
>>
>> One problem resulting from such a large ruleset is the browser's UI.
>> When I select "Enable / Disable Rules" in the menu of the extension
>> Firefox stops working and freezes completely probably because it tries
>> to load the whole ruleset into the dialog's list. This problem should be
>> solved before including a large set of rules.
>>
>> Are you interested in incorporating the generated rules into the public
>> ruleset?
>>
>> Kind Regards
>> Dominik Frühwirt
>> _______________________________________________
>> HTTPS-Everywhere mailing list
>> HTTPS-Everywhere at lists.eff.org
>> https://lists.eff.org/mailman/listinfo/https-everywhere
>
>
>
> _______________________________________________
> HTTPS-Everywhere mailing list
> HTTPS-Everywhere at lists.eff.org
> https://lists.eff.org/mailman/listinfo/https-everywhere


More information about the HTTPS-Everywhere mailing list