[HTTPS-Everywhere] Automatic testing of rules to discover rules that broke (e.g. by site redesign)

Ondrej Mikle ondrej.mikle at nic.cz
Sat Apr 28 09:50:01 PDT 2012


Hi,

seeing how many times "broken rule" topics appear in https-everywhere-rules,
have you considered some automatic tests? (Re-reading the text of this mail
again, I might be attempting to solve a problem you may not really have.)

Assumption 1: original rule is (mostly) correct - if the author didn't notice
error, test script might not either
Breakage caused by: expired cert, site redesign and such
Assumption 2: there is non-negligible percentage of rules broken by the above
reasons (otherwise why waste time)

Idea for automatic detection of broken rules (might be more pain than it's
worth, so beware ;-) ) :

1. On rule submission, crawl part of the site (e.g. following up to N links
breadth-first or getting the links from a search engine)

2. Compare what the code looks like when rules are applied and when not, note
some measure of difference (Levenshtein distance or other similarity measure to
account for ads and junk content). Watch out for simply detectable errors - HTTP
40x, 50x vs 200, cert not passing PKIX validation, maybe cycle detection. Store
the scores to DB.

3. Periodically run the same check, compare with old results. If something
broke, there is high possibility that there will be a big difference in stored
and just computed disctance vector - like an empty page of bad redirect vs
proper page.

The testing could likely be done by scripting Firefox instrumented from an addon
("headless").



Ondrej




More information about the HTTPS-everywhere mailing list