Disclosure:  I haven't the faintest notion about the exact algorithmic details of the NoScript/HTTPSEverywhere engine.  <br><br>In broad strokes, a naive pattern matcher coupled with a simple tree structure <i>might </i>be able to check matches in logarithmic (not linear) time, and <i>should </i>stop processing string characters as soon as a determinate match has been made (i.e. leaf node has been reached).  Throwing regex into the mix adds considerable power and flexibility, though usually at some expense to execution time.  I am curious as to the scale of the tradeoff (bytes of memory vs execution cycles).  With Mozilla making concerted efforts to bring Firefox's speed into the running with Chrome, it might be worth a look.  <br>

<br>For a marginally-relevant, holiday-themed illustration, try XKCD:   <a href="https://www.xkcd.com/835/">https://www.xkcd.com/835/</a><br><br><br><div class="gmail_quote">On Wed, Dec 22, 2010 at 8:30 PM, Drake, Brian <span dir="ltr"><<a href="mailto:brian2@drakefamily.tk">brian2@drakefamily.tk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">How does this work? Will it continue to check every rule, even if it has already found a match? I assume that it’s just calling String.replace(), so the answer is yes. In that case, I don’t see how the naive patterns can possibly be faster.<br>


<br>Still, someone should look into it.<br><br><div class="gmail_quote"><div class="im">On Wed, Dec 22, 2010 at 2016 (UTC-8), Whizz Mo <span dir="ltr"><<a href="mailto:https@whizzmo.com" target="_blank">https@whizzmo.com</a>></span> wrote:<br>

</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex"><div class="im">

At the risk of bringing up space-time tradeoff, has anyone examined the engine's regex vs naive pattern matching speed?  <br><br></div><div class="im"><div class="gmail_quote"><div><div></div><div>On Wed, Dec 22, 2010 at 1841 (UTC-8), Drake, Brian <span dir="ltr"><<a href="mailto:brian2@drakefamily.tk" target="_blank">brian2@drakefamily.tk</a>></span> wrote:<br>


</div></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex"><div><div></div><div>That sounds good to me. I use far more complex “regexy” patterns than that. That’s what regexp is for, isn’t it?<br>


<br>

<div class="gmail_quote"><div>On Wed, Dec 22, 2010 at 1041 (UTC-8), Osama Khalid <span dir="ltr"><<a href="mailto:osamak@gnu.org" target="_blank">osamak@gnu.org</a>></span> wrote:<br>

</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex"><div>Hello,<br>

<br>

Most rules use "(www\.)?" to match URLs with and without the 'www.'<br>

prefix but few (~57 vs. 394) have two different patterns for each<br>

case.<br>

<br>

I suggest changing the few patterns to the regexy way.<br>

<br>

Should I send a patch?<br>

</div><font color="#888888"><br>

--Osama Khalid[snip]</font><br></blockquote></div><br>--<br>Brian Drake<br clear="all">[snip]</div></div></blockquote></div></div></blockquote></div><div class="im"><br>--<br>Brian Drake<br><br>Alternate (slightly less secure) e-mail: <a href="mailto:brian@drakefamily.tk" target="_blank">brian@drakefamily.tk</a><br>


Alternate (old) e-mail: <a href="mailto:brianriab@gmail.com" target="_blank">brianriab@gmail.com</a><br><br>Facebook profile: <a href="https://ssl.facebook.com/profile.php?id=100001206642672" target="_blank">Profile ID 100001206642672</a><br>


Twitter username: <a href="https://twitter.com/BrianJDrake" target="_blank">BrianJDrake</a><br>Wikimedia project username: <a href="https://secure.wikimedia.org/wikipedia/meta/wiki/User:Brianjd" target="_blank">Brianjd</a> (been inactive for a while)<br>


<br></div>All content created by me Copyright © 2010 Brian Drake. All rights reserved.<br>

</blockquote></div><br>