[HTTPS-Everywhere] The standard pattern for 'www.'

Whizz Mo https at whizzmo.com
Wed Dec 22 21:01:00 PST 2010


Disclosure:  I haven't the faintest notion about the exact algorithmic
details of the NoScript/HTTPSEverywhere engine.

In broad strokes, a naive pattern matcher coupled with a simple tree
structure *might *be able to check matches in logarithmic (not linear) time,
and *should *stop processing string characters as soon as a determinate
match has been made (i.e. leaf node has been reached).  Throwing regex into
the mix adds considerable power and flexibility, though usually at some
expense to execution time.  I am curious as to the scale of the tradeoff
(bytes of memory vs execution cycles).  With Mozilla making concerted
efforts to bring Firefox's speed into the running with Chrome, it might be
worth a look.

For a marginally-relevant, holiday-themed illustration, try XKCD:
https://www.xkcd.com/835/


On Wed, Dec 22, 2010 at 8:30 PM, Drake, Brian <brian2 at drakefamily.tk> wrote:

> How does this work? Will it continue to check every rule, even if it has
> already found a match? I assume that it’s just calling String.replace(), so
> the answer is yes. In that case, I don’t see how the naive patterns can
> possibly be faster.
>
> Still, someone should look into it.
>
> On Wed, Dec 22, 2010 at 2016 (UTC-8), Whizz Mo <https at whizzmo.com> wrote:
>
>> At the risk of bringing up space-time tradeoff, has anyone examined the
>> engine's regex vs naive pattern matching speed?
>>
>> On Wed, Dec 22, 2010 at 1841 (UTC-8), Drake, Brian <brian2 at drakefamily.tk
>> > wrote:
>>
>>> That sounds good to me. I use far more complex “regexy” patterns than
>>> that. That’s what regexp is for, isn’t it?
>>>
>>> On Wed, Dec 22, 2010 at 1041 (UTC-8), Osama Khalid <osamak at gnu.org>wrote:
>>>
>>>> Hello,
>>>>
>>>> Most rules use "(www\.)?" to match URLs with and without the 'www.'
>>>> prefix but few (~57 vs. 394) have two different patterns for each
>>>> case.
>>>>
>>>> I suggest changing the few patterns to the regexy way.
>>>>
>>>> Should I send a patch?
>>>>
>>>> --Osama Khalid[snip]
>>>>
>>>
>>> --
>>> Brian Drake
>>> [snip]
>>>
>>
> --
> Brian Drake
>
> Alternate (slightly less secure) e-mail: brian at drakefamily.tk
> Alternate (old) e-mail: brianriab at gmail.com
>
> Facebook profile: Profile ID 100001206642672<https://ssl.facebook.com/profile.php?id=100001206642672>
> Twitter username: BrianJDrake <https://twitter.com/BrianJDrake>
> Wikimedia project username: Brianjd<https://secure.wikimedia.org/wikipedia/meta/wiki/User:Brianjd>(been inactive for a while)
>
> All content created by me Copyright © 2010 Brian Drake. All rights
> reserved.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.eff.org/pipermail/https-everywhere/attachments/20101222/e9b917e6/attachment.html>


More information about the HTTPS-everywhere mailing list