<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
The ruleset style guide is now checked in, but we are still
accepting modifications if you have suggestions on how it should be
different:
<a class="moz-txt-link-freetext" href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-style.md">https://github.com/EFForg/https-everywhere/blob/master/ruleset-style.md</a><br>
<br>
The main notable change here is that we are now encouraging explicit
listing of subdomains <target> tags, *unless* you have a rule
that rewrites every subdomain automatically (but these are
rare-ish). This makes it much easier to achieve sufficient test URL
coverage.<br>
<br>
Here's an example of a rule updated for the new style:
<a class="moz-txt-link-freetext" href="https://github.com/EFForg/https-everywhere/pull/1050/files#diff-dd70919bf072f3db6882e974123c2058L30">https://github.com/EFForg/https-everywhere/pull/1050/files#diff-dd70919bf072f3db6882e974123c2058L30</a><br>
<br>
Thanks!<br>
<br>
--------------------------------------------------<br>
<h1>Ruleset Style Guide</h1>
<p>Goal: rules should be written in a way that is consistent, easy
for humans to
read and debug, reduces the chance of errors, and makes testing
easy.</p>
<p>To that end, here are some style guidelines for writing or
modifying rulesets.
They are intended to help and simplify in places where choices are
ambiguous,
but like all guidelines they can be broken if the circumstances
require it.</p>
<p>Avoid using the left-wildcard ("<target
host='*.example.com'>") unless you
really mean it. Many rules today specify a left-wildcard target,
but the
rewrite rules only rewrite an explicit list of hostnames.</p>
<p>Instead, prefer listing explicit target hosts and a single
rewrite from "^http:" to
"^https:". This saves you time as a ruleset author because each
explicit target
host automatically creates a an implicit test URL, reducing the
need to add your
own test URLs. These also make it easier for someone reading the
ruleset to figure out
which subdomains are covered.</p>
<p>If you know all subdomains of a given domain support HTTPS, go
ahead and use a
left-wildcard, along with a plain rewrite from "^http:" to
"^https:". Make sure
to add a bunch of test URLs for the more important subdomains. If
you're not
sure what subdomains might exist, check the 'subdomain' tab on
Wolfram Alpha:
<a
href="http://www.wolframalpha.com/input/?i=_YOUR_DOMAIN_GOES_HERE_">http://www.wolframalpha.com/input/?i=_YOUR_DOMAIN_GOES_HERE_</a>.</p>
<p>If there are a handful of tricky subdomains, but most subdomains
can handle the
plain rewrite from "^http:" to "^https:", specify the rules for
the tricky
subdomains first, and then then plain rule last. Earlier rules
will take
precedence, and processing stops at the first matching rule. There
may be a tiny
performance hit for processing exception cases earlier in the
ruleset and the
common case last, but in most cases the performance issue is
trumped by readability.</p>
<p>Avoid regexes with long strings of subdomains, e.g. <rule
from="^<a class="moz-txt-link-freetext" href="http://(foo">http://(foo</a>|bar|baz|bananas).example.com" />. These are
hard to read and
maintain, and are usually better expressed with a longer list of
target hosts,
plus a plain rewrite from "^http:" to "^https:".</p>
<p>Prefer dashes over underscores in filenames. Dashes are easier to
type.</p>
<p>When matching an arbitrary DNS label (a single component of a
hostname), prefer
<code>([\w-]+)</code> for a single label (i.e www), or <code>([\w-.]+)</code>
for multiple labels
(i.e. <a href="http://www.beta">www.beta</a>). Avoid more
visually complicated options like <code>([^/:@\.]+\.)?</code>.</p>
<p>For <code>securecookie</code> tags, it's common to match any
cookie name. For these, prefer
<code>.+</code> over <code>.*</code>. They are functionally
equivalent, but it's nice to be
consistent.</p>
<p>Avoid the negative lookahead operator <code>?!</code>. This is
almost always better
expressed using positive rule tags and negative exclusion tags.
Some rulesets
have exclusion tags that contain negative lookahead operators,
which is very
confusing.</p>
<p>Prefer capturing groups <code>(www\.)?</code> over non-capturing
<code>(?:www\.)?</code>. The
non-capturing form adds extra line noise that makes rules harder
to read.
Generally you can achieve the same effect by choosing a
correspondingly higher
index for your replacement group to account for the groups you
don't care about.</p>
<p>Here is an example ruleset today:</p>
<pre><code><ruleset name="WHATWG.org">
<target host="whatwg.org" />
<target host="*.whatwg.org" />
<rule from="^<a class="moz-txt-link-freetext" href="http://((?:developers">http://((?:developers</a>|html-differences|images|resources|\w+\.spec|wiki|www)\.)?whatwg\.org/"
to=<a class="moz-txt-link-rfc2396E" href="https://$1whatwg.org/">"https://$1whatwg.org/"</a> />
</ruleset>
</code></pre>
<p>Here is how you could rewrite it according to these style
guidelines, including
test URLs:</p>
<pre><code><ruleset name="WHATWG.org">
<target host="whatwg.org" />
<target host="developers.whatwg.org" />
<target host="html-differences.whatwg.org" />
<target host="images.whatwg.org" />
<target host="resources.whatwg.org" />
<target host="*.spec.whatwg.org" />
<target host="wiki.whatwg.org" />
<target host="<a class="moz-txt-link-abbreviated" href="http://www.whatwg.org">www.whatwg.org</a>" />
<test url=<a class="moz-txt-link-rfc2396E" href="http://html.spec.whatwg.org/">"http://html.spec.whatwg.org/"</a> />
<test url=<a class="moz-txt-link-rfc2396E" href="http://fetch.spec.whatwg.org/">"http://fetch.spec.whatwg.org/"</a> />
<test url=<a class="moz-txt-link-rfc2396E" href="http://xhr.spec.whatwg.org/">"http://xhr.spec.whatwg.org/"</a> />
<test url=<a class="moz-txt-link-rfc2396E" href="http://dom.spec.whatwg.org/">"http://dom.spec.whatwg.org/"</a> />
<rule from="^http:"
to="https:" />
</ruleset>
</code></pre>
<br>
</body>
</html>