<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jan 26, 2015 at 10:41 PM, Jacob Hoffman-Andrews <span dir="ltr"><<a href="mailto:jsha@eff.org" target="_blank">jsha@eff.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Better automated testing<br>

<br>

Right now we have 3.2k rulesets in the stable branch and 14k in master.<br>

It's time to promote the master branch to stable. Among other things,<br>

master has e10s support, while stable does not. However, there are a<br>

large number of rules that completely break their target sites. See for<br>

instance:  <a href="https://github.com/EFForg/https-everywhere/issues/529" target="_blank">https://github.com/EFForg/https-everywhere/issues/529</a><br>

<a href="https://github.com/EFForg/https-everywhere/issues/849" target="_blank">https://github.com/EFForg/https-everywhere/issues/849</a><br>

<a href="https://github.com/EFForg/https-everywhere/pull/931" target="_blank">https://github.com/EFForg/https-everywhere/pull/931</a><br>

<br>

Promoting master branch to stable will break many websites for many<br>

users, which is not okay. We have a set of ruleset tests from 2013<br>

(<a href="https://github.com/EFForg/https-everywhere/blob/master/src/chrome/content/ruleset-tests.js" target="_blank">https://github.com/EFForg/https-everywhere/blob/master/src/chrome/content/ruleset-tests.js</a>),<br>

but these are only designed to detect mixed content. Also, by default<br>

they run for all 14k rulesets, which means loading 24k URLs. And they<br>

run a very simple heuristic to figure out which URLs to load.<br>

<br>

We need to automate and improve the ruleset tests. We need to add a<br>

test-url syntax to our ruleset files so we can specify which URLs to<br>

load. Newly added rulesets must include test URLs, and we need to<br>

retroactively add test URLs to existing rulesets (at first with an<br>

automated process, then later with manual maintenance). And the ruleset<br>

tests need to produce output that is easily used to disable failing<br>

rulesets. I've broken this down into tasks here:<br>

<a href="https://github.com/EFForg/https-everywhere/labels/Ruleset%20Testing" target="_blank">https://github.com/EFForg/https-everywhere/labels/Ruleset%20Testing</a><br></blockquote><div><br></div><div>Hi Jacob,<br><br>I'm just putting this out here: would it be possible to write some kind of test (to run "offline", of course) that tests that all the content that is loaded in a page is available both before and after HTTPS-E's intervention?<br><br></div><div>What I mean is to load every CSS/JS/image on a specific test page to ensure that all the content is loaded equally on HTTP and HTTPS.<br><br></div><div>We would have to write exceptions to it, for example for content that is generated dynamically, but if we run some kind of "diff" (on text-ish content, CSS and JS) and we put a threshold in to specify how much of that can change before we issue a warning or an error, it might be possible to improve the automated testing in a way that allows us to identify which pages would be most likely broken.<br><br></div><div>With the error reports we could then check if the page is actually broken (badly) and decide whether we can force the rule to be enabled/disabled.<br><br></div><div>We could also implement the threshold directly into the test rules themselves, in order to be able to "override" a default testing value.<br>Finally, if we choose to put the test-urls (and all that comes with it) in the rule itself, we will have to modify the script that builds the SQLite DB to remove the test cases from the rules before putting them in the DB.<br><br></div><div>Not doing this might negatively impact the performance of the extension (more text to load when loading the rule).<br><br></div><div>We could, to avoid this, put the test files in a different folder and name them exactly as their "rule" file in the rules folder.<br></div><div>We separate the code, but the naming convention still allows us to match them, and we won't have issues with a rule getting modified by accident by the test developer :)<br><br></div><div>Any opinions on this? I just wrote everything that came to mind so iif I said something exceedingly stupid please let me know :)<br><br></div><div>Thanks,<br><br></div><div>Claudio<br></div></div></div></div>