[HTTPS-Everywhere] Proposal: ruleset maintainers and test URLs

Jacob S Hoffman-Andrews jsha at eff.org
Wed Aug 13 07:32:29 PDT 2014


Hi all,

As HTTPS Everywhere encompasses more sites, we're having trouble 
validating and maintaining rulesets in a scalable way. I'd like to 
propose two small changes that should help keep things running 
smoothly: Every ruleset should have a maintainer, and every ruleset 
should have a set of test URLs.

The maintainer requirement is pretty straightforward. We would 
probably specify it on the top-level ruleset node, e.g.:

<ruleset name="Twitter" maintainer="jsha at eff.org (Jacob 
Hoffman-Andrews)">

For the test URLs, we would like at least one URL that exercises 
each rewriting rule, so it makes sense to hang test URLs off of the 
rule tag:

(old)
<rule from="^http://(?:www\.)?t\.co/"
     to="https://t.co/" />

(new)
<rule from="^http://(?:www\.)?t\.co/"
     to="https://t.co/">
   <test href="https://t.co/kU0aUmcm4u" />
   <test href="https://www.t.co/kU0aUmcm4u" />
</rule>

Maintainers would be responsible for choosing the right number of 
tests to get adequate coverage when a pattern covers multiple hosts, 
but there would be a test-enforced minimum of two URLs per rule. In 
the case of bare domain rewrites, the test URLs should cover both 
'/' and some page other than the root.

To test these URLs, we would first fetch all of them with curl. Any 
URLs that return 4xx or 5xx would be marked as ignored for that run. 
A special maintainer mode would flag for replacement all URLs that 
return 4xx or 5xx.

After filtering out failing URLs, we'd load up a headless Firefox 
instance with the extension (see starting Travis config at 
https://github.com/EFForg/https-everywhere/pull/421), and load each 
test URL in turn. We would validate that the rewrite rule actually 
gets triggered, that the page context gets a 200 response, and that 
not more than 3 subresources caused one of: (mixed content blocking, 
4xx, 5xx).

We'd run ruleset validation for all URLs daily or weekly, and any 
changes to a given ruleset would automatically trigger validation 
for that ruleset.

When a ruleset fails during the daily/weekly check, we'd disable it 
- either manually or automatically - until someone has time to fix it.

What do you think?

Thanks,
Jacob


More information about the HTTPS-Everywhere mailing list