[HTTPS-Everywhere] Python interface for the rulesets?

Ondrej Mikle ondrej.mikle at nic.cz
Thu Jan 3 07:30:50 PST 2013


On 12/26/2012 10:47 AM, Osama Khalid wrote:
> Has anyone used HTTPSEverywhere rulesets in Python?  I am thinking
> about writing a small script that can use the rulesets.  As far as I
> know, JavaScript and Python regex libraries are pretty much
> compatible, so it should not be too difficult, but it would be easier
> to find something ready.

I wrote a ruleset matcher in python as a part of HTTPS Everywhere ruleset
checker (https://github.com/hiviah/https-everywhere-checker). Python 're' module
and JS regexps are very similar, but not identical, I had to use 'regex' module
because of some limitations of 're' module.

The differences between 're' and JS regexps I remember:

1) if there was a regexp like "^http://(www\.)?domain\.tld/" and the string to
be matched didn't contain the optional 'www.' part, 're' module failed for some
reason (or the capture groups had different numbering, not quite sure now)

2) JS regexp seems to only support capture groups 0-9, i.e. a target like
'to="https://$101.org/"' would rewrite only the "$1" part. Whereas python's re
and regex module would treat it as 101-st group. Thus the "$1" in substitution
needed to be replaced with "\g<1>". See Rule:__init__() in
https://github.com/hiviah/https-everywhere-checker/blob/master/rules.py

The algorithm for matching rules is a bit different than what HTTPS Everywhere
uses, and could in theory support multiple wildcard '*' parts (at the expense of
computational complexity). I think otherwise both algorithms should be
functionally equivalent, at least for the rulesets that existed couple of months
ago. But I didn't do any formal proof :-)

Ondrej

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <http://lists.eff.org/pipermail/https-everywhere/attachments/20130103/782ad3c0/attachment.sig>


More information about the HTTPS-everywhere mailing list