[HTTPS-Everywhere] Automatically generated rules

Dominik Frühwirt dominik.fruehwirt at gmx.at
Thu Mar 12 11:44:46 PDT 2015


Yes, you're right, it's a kind of combination of these tools. I see your
concerns about the generated rules and since you're heading towards a
perfectly working ruleset I understand that you don't want them to be
included.

If you'd like to have the results of my evaluation of different
similarity matching algorithms for improving the
https-everywhere-checker tool, I will gladly provide my thesis as soon
as I've finished writing.

KR,
Dominik

On 2015-03-12 19:12, Maxim Nazarenko wrote:
> As far as I understand, the project looks like HTTPS-Finder and
> https-everywhere-checker combined. Dominik, is that right? Writing
> rules is more art than science, I am afraid, and therefore I share
> Numismatika's concerns, but what is interesting is detecting whether
> http and https versions are essentially the same. Right now
> https-everywhere-checker has two different metrics, with more or less
> arbitrary threshold, please correct me if I am wrong. Some statistical
> data on what method is "better" and what threshold is "reasonable" may
> be interesting, IMHO.
> 
> Best regards,
> Maxim Nazarenko
> 
> On 12 March 2015 at 12:54, Numismatika
> <numismatika-everywhere at eclipso.ch> wrote:
>> I do not really trust the quality of rules that were generated without
>> any human interaction.
>> I think the better approach is to have something like
>> https://github.com/kevinjacobs/HTTPS-Finder/.
>> A setting that the addon should notify you if it detects unsecured
>> content that is available over TLS .
>> If you go to fix up the above mentioned addon and solve the
>> shortcomings, i think we benefit more than we
>> would from a few thousand rules without any quality assurance by a human
>> eye.
>>
>> Numismatika
>>
>> Am 12.03.2015 um 10:19 schrieb Dominik Frühwirt:
>>> Hi,
>>>
>>> I'm currently finishing my master thesis in computer science which
>>> addresses the ruleset of HTTPS-E. Simply put, I try to generate rules
>>> for a large amount of websites automatically. Therefore, an automated
>>> browser (PhantomJS) has been utilized for fetching the HTML source of
>>> the HTTP websites and corresponding HTTPS websites. These are found by
>>> trying to reach the most frequently used subdomains of HTTPS secured
>>> domains. The retrieved sources are compared by using different
>>> similarity matching algorithms and treated as positive match when the
>>> calculated similarity value exceeds a certain threshold.
>>>
>>> I took Alexa's top million websites as an input and generated about
>>> 89,000 rules out of it. Since only the landing pages are compared,
>>> particularities like resources that are available on a certain path via
>>> HTTP but not via HTTPS are not considered (no exclusion patterns).
>>> Generally, the generated rules are not as accurate as the community
>>> written ones. Hence, manually created rules should not be overruled when
>>> merging the generated rules into the current ruleset.
>>>
>>> One problem resulting from such a large ruleset is the browser's UI.
>>> When I select "Enable / Disable Rules" in the menu of the extension
>>> Firefox stops working and freezes completely probably because it tries
>>> to load the whole ruleset into the dialog's list. This problem should be
>>> solved before including a large set of rules.
>>>
>>> Are you interested in incorporating the generated rules into the public
>>> ruleset?
>>>
>>> Kind Regards
>>> Dominik Frühwirt
>>> _______________________________________________
>>> HTTPS-Everywhere mailing list
>>> HTTPS-Everywhere at lists.eff.org
>>> https://lists.eff.org/mailman/listinfo/https-everywhere
>>
>>
>>
>> _______________________________________________
>> HTTPS-Everywhere mailing list
>> HTTPS-Everywhere at lists.eff.org
>> https://lists.eff.org/mailman/listinfo/https-everywhere
> _______________________________________________
> HTTPS-Everywhere mailing list
> HTTPS-Everywhere at lists.eff.org
> https://lists.eff.org/mailman/listinfo/https-everywhere
> 



More information about the HTTPS-Everywhere mailing list