[HTTPS-E Rulesets] Updates to Google.xml
Peter Eckersley
pde at eff.org
Tue Jan 18 15:28:55 PST 2011
(Moving this to the -rules mailing list)
Osama, I've made a few inline comments on this patch below.
On Fri, Dec 24, 2010 at 11:29:36PM +0300, Osama Khalid wrote:
> Hello,
>
> I attached a patch to the Google Search ruleset. It fixes the
> following:
> * URLs like: http://www.google.com.sa/webhp?hl=ar are not currently
> handled. Fixing that by adding a general pattern that handles
> everything related to webhp and removing duplicated old patterns
> (which will be handled by the new pattern).
> * Using a webhp-like general pattern to merge two different patterns
> that deal with "/search" and "/#".
> * URLs like: http://www.google.com/firefox?q=test are not supposed to
> preform a search query, but the ruleset currently redirects them to
> https://encrypted.google.com/search?q=test. Fixing that by
> redirecting them to https://encrypted.google.com/webhp?q=test
>
> This patch is licensed under GPLv2+. All my future contributions to
> the HTTPSEverywhere projects are under the same license unless
> otherwise noted.
>
> --Osama Khalid
> diff --git a/src/chrome/content/rules/Google.xml b/src/chrome/content/rules/Google.xml
> index 812669c..dd8e4c9 100644
> --- a/src/chrome/content/rules/Google.xml
> +++ b/src/chrome/content/rules/Google.xml
> @@ -28,17 +28,14 @@
> <!-- Some Google pages can generate naive links back to the
> unencrypted version of encrypted.google.com, which is a
> 301 but theoretically vulnerable to SSL stripping. -->
> -
> <rule from="^http://encrypted\.google\.com/"
> to="https://encrypted.google.com/"/>
>
> <!-- The most basic case. -->
> -
> <rule from="^http://(www\.)?google\.com/search"
> to="https://encrypted.google.com/search"/>
>
> <!-- A very annoying exception that we seem to need for the basic case -->
> -
> <exclusion pattern="^http://(www\.)?google\.com/search.*tbs=shop" />
> <exclusion pattern="^http://clients[0-9]\.google\.com/.*client=products.*" />
> <exclusion pattern="^http://suggestqueries\.google\.com/.*client=products.*" />
> @@ -50,34 +47,26 @@
> <!-- But not the forums, bizarrely. -->
> <exclusion pattern="^http://www\.google\.com/support/forum([\?/].*)?$"/>
>
> - <!-- There are two distinct cases for these firefox searches -->
> -
> - <rule from="^http://(www\.)?google\.com/firefox/?$"
> - to="https://encrypted.google.com/"/>
> -
> - <rule from="^http://(www\.)?google\.com/firefox"
> - to="https://encrypted.google.com/search"/>
> -
> - <rule from="^http://(www\.)?google\.com/webhp"
> + <!-- Firefox homepage isn't currently available in HTTPS.-->
> + <rule from="^http://(www\.)?google\.com/firefox/?"
> to="https://encrypted.google.com/webhp"/>
I presume the "/?" here is a bug? The similar preexisting construction was
"/?$" , which would only strip a slash if it the last character in the URI.
>
> - <rule from="^http://(www\.)?google\.com/#"
> - to="https://encrypted.google.com/#"/>
> + <!-- If any parameter (e.g. q or hl) is set in any Google domain,
> + move it to the encrypted domain.-->
> + <rule
> + from="^http://(www\.)?google(\.com?)?(\.[^/@:][^/@:])?/webhp"
> + to="https://encrypted.google.com/webhp" />
This looks useful.
>
> <rule from="^http://(www\.)?google\.com/$"
> to="https://encrypted.google.com/"/>
>
> - <!-- most google international sites look like "google.fr" -->
> -
> + <!-- most google international sites look like "google.fr".
> + some look like "google.co.jp".
> + and some crazy ones like "google.com.au".-->
> <rule
> - from="^http://(www\.)?google\.[^/@:][^/@:]/(search\?|firefox|#)"
> + from="^http://(www\.)?google(\.com?)?(\.[^/@:][^/@:])?/(search\?|#)"
> to="https://encrypted.google.com/#" />
I may retain two rules in this section. Readability is more important than
minimising the number of regexps, I think.
>
> - <!-- some look like "google.co.jp" -->
> - <!-- and some crazy ones like "google.com.au" -->
> - <rule
> - from="^http://(www\.)?google\.com?\.[^/@:][^/@:]/(search\?|firefox|#)"
> - to="https://encrypted.google.com/#" />
> <!-- Completion urls look like this:
>
> http://clients2.google.co.jp/complete/search?hl=ja&client=hp&expIds=17259,24660,24729,24745&q=m&cp=1 HTTP/1.1\r\n
> _______________________________________________
> HTTPS-everywhere mailing list
> HTTPS-everywhere at mail1.eff.org
> https://mail1.eff.org/mailman/listinfo/https-everywhere
--
Peter Eckersley pde at eff.org
Senior Staff Technologist Tel +1 415 436 9333 x131
Electronic Frontier Foundation Fax +1 415 436 9993
More information about the HTTPS-Everywhere-Rules
mailing list