[HTTPS-Everywhere] The standard pattern for 'www.'

Wed Dec 22 21:15:19 PST 2010

Just to be clear about where these details lie, I think HTTPS Everywhere
just uses the regex engine built into Firefox, but that still leaves the
question about what calls it makes (see my previous post).

I’ve seen that sort of analysis before, but not for pattern-matching. Can
you elaborate? I don’t see how it’s possible (but then I have no idea how
regex engines actually work when optimisations are included).

On Wed, Dec 22, 2010 at 2101 (UTC-8), Whizz Mo <https at whizzmo.com> wrote:

> Disclosure:  I haven't the faintest notion about the exact algorithmic
> details of the NoScript/HTTPSEverywhere engine.
>
> In broad strokes, a naive pattern matcher coupled with a simple tree
> structure *might *be able to check matches in logarithmic (not linear)
> time, and *should *stop processing string characters as soon as a
> determinate match has been made (i.e. leaf node has been reached).  Throwing
> regex into the mix adds considerable power and flexibility, though usually
> at some expense to execution time.  I am curious as to the scale of the
> tradeoff (bytes of memory vs execution cycles).  With Mozilla making
> concerted efforts to bring Firefox's speed into the running with Chrome, it
> might be worth a look.
>
> For a marginally-relevant, holiday-themed illustration, try XKCD:
> https://www.xkcd.com/835/
>
>
>
> On Wed, Dec 22, 2010 at 2030 (UTC-8), Drake, Brian <brian2 at drakefamily.tk>wrote:
>
>> How does this work? Will it continue to check every rule, even if it has
>> already found a match? I assume that it’s just calling String.replace(), so
>> the answer is yes. In that case, I don’t see how the naive patterns can
>> possibly be faster.
>>
>> Still, someone should look into it.
>>
>> On Wed, Dec 22, 2010 at 2016 (UTC-8), Whizz Mo <https at whizzmo.com> wrote:
>>
>>> At the risk of bringing up space-time tradeoff, has anyone examined the
>>> engine's regex vs naive pattern matching speed?
>>>
>>> On Wed, Dec 22, 2010 at 1841 (UTC-8), Drake, Brian <
>>> brian2 at drakefamily.tk> wrote:
>>>
>>>> That sounds good to me. I use far more complex “regexy” patterns than
>>>> that. That’s what regexp is for, isn’t it?
>>>>
>>>> On Wed, Dec 22, 2010 at 1041 (UTC-8), Osama Khalid <osamak at gnu.org>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Most rules use "(www\.)?" to match URLs with and without the 'www.'
>>>>> prefix but few (~57 vs. 394) have two different patterns for each
>>>>> case.
>>>>>
>>>>> I suggest changing the few patterns to the regexy way.
>>>>>
>>>>> Should I send a patch?
>>>>>
>>>>> --Osama Khalid[snip]
>>>>>
>>>>
>>>> --
>>>> Brian Drake
>>>> [snip]
>>>>
>>>
>> --
>> Brian Drake
>> [snip]
>>
>
--
Brian Drake

Alternate (slightly less secure) e-mail: brian at drakefamily.tk
Alternate (old) e-mail: brianriab at gmail.com

Facebook profile: Profile ID
100001206642672<https://ssl.facebook.com/profile.php?id=100001206642672>
Twitter username: BrianJDrake <https://twitter.com/BrianJDrake>
Wikimedia project username:
Brianjd<https://secure.wikimedia.org/wikipedia/meta/wiki/User:Brianjd>(been
inactive for a while)

All content created by me Copyright © 2010 Brian Drake. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.eff.org/pipermail/https-everywhere/attachments/20101223/d15f3287/attachment.html>