[HTTPS-Everywhere] Important: New testing requirements for rulesets

Jacob Hoffman-Andrews jsha at eff.org
Mon Feb 9 12:11:57 PST 2015


Hi all,

As part of the maintainability effort, I'm introducing testing of
rulesets through hiviah's HTTPS Everywhere Checker:
https://github.com/hiviah/https-everywhere-checker. Checking all 14k
rulesets takes a long time, so for now I'll be running this manually and
disabling rulesets.

To make automated testing easier, I've introduced a new <test url="..."
/> tag for rulesets. This allows ruleset authors to specify URLs that
should be fetched to verify that the ruleset still works. I've
introduced testing code to ensure that all new or changed rulesets have
sufficient test cases. Details below. I'll also be adding this
documentation to the source tree shortly.

# Ruleset coverage requirements

Goals:
 100% coverage of all targets and all branches of all regexes in each
ruleset.

Each ruleset has a number of "implicit" test URLs based on the target
hosts. For
each target host e.g. example.com, there is an implicit test URL of
http://example.com/. Exception: target hosts that contain a wildcard
("*.example.com") do
not create an implicit test URL.

Additional test URLs can be added with the new <test> tag in the XML, e.g.
<test url="http://example.com/complex-page">.

Test URLs will be matched against the regexes in each <rule> and
<exclusion>. A
test URL can only match against one <rule> and one <exclusion>. Once all the
test URLs have been matched up, we count the number of test URLs
matching each
<rule> and each <exclusion>, and make sure the count meets the minimum
number.
The minimum number of test URLs for each <rule> or <exclusion> is one
plus the
number of '*', '+', '?', or '|' characters in the regex. Since each of these
characters increases the complexity of the regex (usually increasing the
variety
of URLs it can match), we require correspondingly more test URLs to
ensure good
coverage.

TODO: We'd like to also require that there be at least three test URLs
for every
target host with a left-side wildcard, and at least ten test URLs for each
target host with a right-side wildcard. But this is not yet implemented.

# Example:
      <ruleset name="example.com">
        <target host="example.com" />
        <target host="*.example.com" />

        <test url="http://www.example.com/" />
        <test url="http://beta.example.com/" />

        <rule from="^http://([\w-]+\.)?dezeen\.com/"
            to="https://$1dezeen.com/" />

      </ruleset>

This ruleset has one implicit test URL from a target host
("http://example.com/"). The other target host has a wildcard, so creates no
implicit test URL. There's a single rule. That rule contains a '+' and a
'?', so
it requires a total of three matching test URLs. We add the necessary
test URLs
using explicit <test> tags.

# Testing and Continuous Build

Testing for rulest coverage is now part of the Travis CI continuous build.
Currently we only test rulesets that have been modified since February 2
2015.
Submitting changes to any ruleset that does not meet the coverage
requirements
will break the build. This means that even fixes of existing rules may
require
additional work to bring them up to snuff.

To run the tests locally, you'll need the https-everywhere-checker,
which is now
a submodule of https-everywhere. Run these commands to set it up:

    git submodule init
    git submodule update
    cd https-everywhere-checker
    pip install --user -r requirements.txt
    cd -
    ./test-ruleset-coverage.sh

Note you may also need to apt-get install libcurl4-openssl-dev so that
one of
the requirements in https-everywhere-checker can be satisfied.

To test a specific ruleset:

     python2.7
https-everywhere-checker/src/https_everywhere_checker/check_rules.py
https-everywhere-checker/checker.config.sample rules/Example.xml




More information about the HTTPS-Everywhere mailing list