[HTTPS-Everywhere] tr says ALDI.xml contains non-ASCII character(s)

Seth David Schoen schoen at eff.org
Wed Mar 16 11:31:16 PDT 2011


Andreas F writes:

> The german supermarket chain ALDI has domain names that contain non-ascii characters.   The "trivial-validate" script uses 'tr' to detect non-ascii characters, and the validation fails (for me).  The 'tr' manual says that  "[:print:]" represents "all printable characters, including space".  Is the 'tr' character class "[:print:]" locale specific? Would the test pass if run on a system with german locale?
> 
> $ egrep '(from|to)=' ./src/chrome/content/rules/ALDI.xml | tr -d '[:print:][:space:]'
> üü
> 
> $ man tr
> 
> [:print:]
> 
> 
>       all printable characters, including space

The current version in git will still build an XPI if that test fails.
But you're right that '[:print:]' is locale-specific and the test will
probably not reject characters that appear in a particular locale.
Although I have sometimes switched locales on my computer for language
practice, this problem doesn't bother me much because the official
releases are always built on en_US.UTF-8 (and they will include the
ALDI rule, which is not a homoglyph attack -- and none of the characters
in [:print:] in a German locale are homoglyphs of each other).

The thing that we're concerned with preventing is 

  <rule from="^http://(www\.)?paypal\.com/" to="https://www.pаypаl.com/"/>

The vowels in the destination there are U+0430 CYRILLIC SMALL LETTER A,
which looks like U+0061 LATIN SMALL LETTER A (in many fonts).  Or

  <rule from="^http://(www\.)?paypal\.com/" to="https://www.pаyраl.com/"/>

which is the same but with the second "p" replaced with U+0400 CYRILLIC
SMALL LETTER ER.

If someone wants to propose a test that will detect this situation better
than the tr thing, I'd be happy to use that test instead of what we have
now!

-- 
Seth Schoen
Senior Staff Technologist                         schoen at eff.org
Electronic Frontier Foundation                    https://www.eff.org/
454 Shotwell Street, San Francisco, CA  94110     +1 415 436 9333 x107



More information about the HTTPS-everywhere mailing list