[HTTPS-E Rulesets] From Development to Stable

Claudio Moretti flyingstar16 at gmail.com
Tue Jan 14 14:40:36 PST 2014


Hey Yan,

I had a bit of free time this evening, so I decided it was a good time to
learn Python :P

This script works with python3.3. I need somebody to check the code (my
eyes are sore at the moment), but basically: (the script works ONLY in a
directory that contains the src/chrome/content/rules folders - dirtree
below)

1) you unzip the Alexa Top1M in the same directory as the script
2) you generate a git diff with this command (it's also in the script -
probably you know a better alternative)
    *git diff --name-status master..remotes/origin/stable
src/chrome/content/rules >> newRules.diff*
    and put the* newRules.diff *file in the same directory as merger.py and
top-1m.csv
3) You launch merger.py

What it does now is just:
- printing "FOUND: src/chrome/content/rules/RULENAME.xml" if one of the
targets in the rulefile is found in the Alexa list
- printing "File not found: *filename" *for those file with a weird
encoding that were messing up my cool script (just two of them, can't
remember their names)

It should be easy to tweak to - for example - copy a "FOUND" rule in a
specific directory somewhere else for review/merge with stable

During the run, it splits the git diff in two segments: the "action" letter
(A,D,M) and the rule path. I'm considering *only* rules with an "A",
meaning new rules added.

Directory tree:
\-
 --\src
    --\chrome
       --\content
          --\rules
             --\ [*.xml]
 -- merger.py
 -- newRules.diff
 -- top-1m.csv


(hope it's clear, please tell me if it's not. I'll try to do something
better :) )

Please, somebody, review the script and tell me (a) if it's actually
working and (b) what can I do to improve it!

Cheers,

Claudio


On Tue, Jan 14, 2014 at 6:18 AM, Yan Zhu <yan at eff.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
>
>
> On 01/13/2014 10:03 PM, Drake, Brian wrote:
>
> > I’m still concerned about the other part of my message. Right now,
> > it seems that, to review a ruleset properly, there are at least
> > four places that I need to check:
> >
> > 1. Mailing list archives 2. trac.torproject.org
> > <http://trac.torproject.org> bug tracker 3. Github bug tracker 4.
> > Git (to find out the history of the ruleset, especially if I’m
> > using a stable release but want to account for ruleset changes in
> > the development branch)
> >
>
> Definitely open to suggestions about how to consolidate these, though
> I find that mailing list + 2 bug trackers is manageable as long as I'm
> not looking too far back in time (i.e., pre-December 2013).
>
> But usually, I consider a new ruleset to be properly reviewed if
> someone has built a test FF/Chrome extension with it included and
> tested it out.
>
> - -Yan
>
>
> >
> > -- Brian Drake
> >
> > All content created by me: Copyright
> > <http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html> ©
> > 2014 Brian Drake. All rights reserved.
> >
> > On Tue, Jan 14, 2014 at 0536 (UTC), Yan Zhu <yan at eff.org
> > <mailto:yan at eff.org>> wrote:
> >
> >
> >
> > On 01/13/2014 09:18 PM, Drake, Brian wrote:
> >> Maybe people could opt-in to … is this where we would say
> >> “telemetry”? We could collect information about how much the
> >> rules actually get used, as well as things like redirect loops,
> >> to try to determine if a rule has been tested enough with no
> >> problems being found.
> >
> > This is theoretically a good idea, except in practice there are
> > some obstacles:
> >
> > 1. Stuff like automatically detecting when a page appears "broken"
> > or even just Javascript redirects is really, really hard. People
> > have tried using metrics like the Levenshtein distance between the
> > DOM tree of the HTTP and HTTPS sites, but nothing so far really
> > works.
> >
> > 2. Given that automatically detecting breakage is tricky, it seems
> > that one of our best ways to figure out when something breaks is
> > to see how often users disable certain rules. This is hopefully
> > going to get merged soon (see other thread).
> >
> > 3. Info like "how often a rule gets used" is hard to collect
> > safely, in the sense that collecting enough of it tends to
> > inadvertently create the risk of deanonymizing users. EFF tries as
> > hard as it can not to collect and store fingerprintable data on its
> > servers. :)
> >
> >
> >> What we desperately need as well is an easy way to find any
> >> issues already reported with a ruleset.
> >
> >> For example, I when I was working on boohoo.com
> >> <http://boohoo.com> <http://boohoo.com>, I found many rulesets in
> >> the development branch (but not yet in stable) that were
> >> relevant, carefully checked the rules in them, and found many
> >> issues [1]. But since I am not familiar with any of those
> >> domains, I might have missed something. Or I might have reported
> >> issues that were already known. I have no idea.
> >
> >> [1]
> >
> >
> https://lists.eff.org/pipermail/https-everywhere-rules/2014-January/001792.html
> >
> >
> >> -- Brian Drake
> >
> >> All content created by me: Copyright
> >> <http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html> ©
> >> 2014 Brian Drake. All rights reserved.
> >
> >> On Tue, Jan 14, 2014 at 0435 (UTC), Yan Zhu <yan at eff.org
> > <mailto:yan at eff.org>
> >> <mailto:yan at eff.org <mailto:yan at eff.org>>> wrote:
> >
> >
> >
> >> On 01/13/2014 06:29 PM, Drake, Brian wrote:
> >>> What is the process for moving a ruleset from the development
> >>> branch to the stable branch?
> >
> >> Thank you thank you thank you for asking that question. I opened
> >> a ticket for this exact problem a few weeks ago:
> >> https://trac.torproject.org/projects/tor/ticket/10310
> >
> >> Right now, the answer is "when yan or peter thinks it's
> >> important and probably been tested enough." I'll also merge
> >> something from dev to stable if someone pokes me about it
> >> specifically (ex: in the case of the stackexchange rule, since
> >> that was a blocker for Tor launching their own stackexchange
> >> site).
> >
> >> Anyway, whoever works on that ticket linked above gets my
> >> undying love.
> >
> >> -Yan
> >
> >
> >
> >
> >
>
> - --
> Yan Zhu                           yan at eff.org
> Technologist                      Tel  +1 415 436 9333 x134
> Electronic Frontier Foundation    Fax  +1 415 436 9993
> -----BEGIN PGP SIGNATURE-----
>
> iQEcBAEBCgAGBQJS1NbGAAoJENC7YDZD/dnsOpgIAIqPUvXXyi3pGHfIhZrlvDOi
> 1gszqGnBmipCwPepve5AHUgZw2u4rapqOb908KcPPF8L0AOE93tPgWG12RsmXwHh
> heNvgWDY+K1y+sCzd1vEm+0pm5gW/e3trrvat47tK3OZTTVC32n4i8ywLbGDheQ0
> pWxcFGsm/72+3Gz4h1H5VwdTHsSjd1VJgEqwlGXPn5a3eAqcpXWdEqUzgbnB8b4y
> +T+149FkXl8G4tUHjtAeEFqoTI04hS3b1S7/n6bRjyUnyohbQS/k59tchCobQHQm
> gY0m96U/wVATjTVZOqlx5o1h5tPUNdCukxGPHieNcZEXyHWTDqTsEz7qAnkL6lc=
> =rdhI
> -----END PGP SIGNATURE-----
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.eff.org/pipermail/https-everywhere-rules/attachments/20140114/dc21a106/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: merger.py
Type: text/x-python
Size: 2409 bytes
Desc: not available
URL: <https://lists.eff.org/pipermail/https-everywhere-rules/attachments/20140114/dc21a106/attachment.py>


More information about the HTTPS-Everywhere-Rules mailing list