[HTTPS-E Rulesets] Updates to Google.xml

Peter Eckersley pde at eff.org
Tue Jan 18 15:28:55 PST 2011


(Moving this to the -rules mailing list)

Osama, I've made a few inline comments on this patch below.

On Fri, Dec 24, 2010 at 11:29:36PM +0300, Osama Khalid wrote:
> Hello,
> 
> I attached a patch to the Google Search ruleset. It fixes the
> following:
> * URLs like: http://www.google.com.sa/webhp?hl=ar are not currently
>   handled. Fixing that by adding a general pattern that handles
>   everything related to webhp and removing duplicated old patterns
>   (which will be handled by the new pattern).
> * Using a webhp-like general pattern to merge two different patterns
>   that deal with "/search" and "/#".
> * URLs like: http://www.google.com/firefox?q=test are not supposed to
>   preform a search query, but the ruleset currently redirects them to
>   https://encrypted.google.com/search?q=test. Fixing that by
>   redirecting them to https://encrypted.google.com/webhp?q=test
> 
> This patch is licensed under GPLv2+. All my future contributions to
> the HTTPSEverywhere projects are under the same license unless
> otherwise noted.
> 
> --Osama Khalid

> diff --git a/src/chrome/content/rules/Google.xml b/src/chrome/content/rules/Google.xml
> index 812669c..dd8e4c9 100644
> --- a/src/chrome/content/rules/Google.xml
> +++ b/src/chrome/content/rules/Google.xml
> @@ -28,17 +28,14 @@
>    <!-- Some Google pages can generate naive links back to the
>         unencrypted version of encrypted.google.com, which is a
>         301 but theoretically vulnerable to SSL stripping. -->
> -
>    <rule from="^http://encrypted\.google\.com/"
>            to="https://encrypted.google.com/"/>
>  
>    <!-- The most basic case. -->
> -
>    <rule from="^http://(www\.)?google\.com/search" 
>            to="https://encrypted.google.com/search"/>
>  
>    <!-- A very annoying exception that we seem to need for the basic case -->
> -
>    <exclusion pattern="^http://(www\.)?google\.com/search.*tbs=shop" />
>    <exclusion pattern="^http://clients[0-9]\.google\.com/.*client=products.*" />
>    <exclusion pattern="^http://suggestqueries\.google\.com/.*client=products.*" />
> @@ -50,34 +47,26 @@
>    <!-- But not the forums, bizarrely. -->
>    <exclusion pattern="^http://www\.google\.com/support/forum([\?/].*)?$"/>
>  
> -  <!-- There are two distinct cases for these firefox searches -->
> -
> -  <rule from="^http://(www\.)?google\.com/firefox/?$" 
> -          to="https://encrypted.google.com/"/>
> -
> -  <rule from="^http://(www\.)?google\.com/firefox" 
> -          to="https://encrypted.google.com/search"/>
> -
> -  <rule from="^http://(www\.)?google\.com/webhp" 
> +  <!-- Firefox homepage isn't currently available in HTTPS.-->
> +  <rule from="^http://(www\.)?google\.com/firefox/?" 
>            to="https://encrypted.google.com/webhp"/>

I presume the "/?" here is a bug?  The similar preexisting construction was
"/?$" , which would only strip a slash if it the last character in the URI.  

>  
> -  <rule from="^http://(www\.)?google\.com/#" 
> -          to="https://encrypted.google.com/#"/>
> +  <!-- If any parameter (e.g. q or hl) is set in any Google domain,
> +       move it to the encrypted domain.-->
> +  <rule 
> +    from="^http://(www\.)?google(\.com?)?(\.[^/@:][^/@:])?/webhp"
> +      to="https://encrypted.google.com/webhp" />

This looks useful.
>  
>    <rule from="^http://(www\.)?google\.com/$" 
>            to="https://encrypted.google.com/"/>
>  
> -  <!-- most google international sites look like "google.fr" -->
> -
> +  <!-- most google international sites look like "google.fr".
> +       some look like "google.co.jp".
> +       and some crazy ones like "google.com.au".-->
>    <rule 
> -    from="^http://(www\.)?google\.[^/@:][^/@:]/(search\?|firefox|#)" 
> +    from="^http://(www\.)?google(\.com?)?(\.[^/@:][^/@:])?/(search\?|#)" 
>        to="https://encrypted.google.com/#" />

I may retain two rules in this section.  Readability is more important than
minimising the number of regexps, I think.

>  
> -  <!-- some look like "google.co.jp" -->
> -  <!-- and some crazy ones like "google.com.au" -->
> -  <rule 
> -    from="^http://(www\.)?google\.com?\.[^/@:][^/@:]/(search\?|firefox|#)" 
> -      to="https://encrypted.google.com/#" />
>    <!-- Completion urls look like this: 
>  
>  http://clients2.google.co.jp/complete/search?hl=ja&client=hp&expIds=17259,24660,24729,24745&q=m&cp=1 HTTP/1.1\r\n

> _______________________________________________
> HTTPS-everywhere mailing list
> HTTPS-everywhere at mail1.eff.org
> https://mail1.eff.org/mailman/listinfo/https-everywhere


-- 
Peter Eckersley                            pde at eff.org
Senior Staff Technologist         Tel  +1 415 436 9333 x131
Electronic Frontier Foundation    Fax  +1 415 436 9993



More information about the HTTPS-Everywhere-Rules mailing list