[HTTPS-Everywhere] tr says ALDI.xml contains non-ASCII character(s)
Seth David Schoen
schoen at eff.org
Wed Mar 16 11:31:16 PDT 2011
Andreas F writes:
> The german supermarket chain ALDI has domain names that contain non-ascii characters. The "trivial-validate" script uses 'tr' to detect non-ascii characters, and the validation fails (for me). The 'tr' manual says that "[:print:]" represents "all printable characters, including space". Is the 'tr' character class "[:print:]" locale specific? Would the test pass if run on a system with german locale?
>
> $ egrep '(from|to)=' ./src/chrome/content/rules/ALDI.xml | tr -d '[:print:][:space:]'
> üü
>
> $ man tr
>
> [:print:]
>
>
> all printable characters, including space
The current version in git will still build an XPI if that test fails.
But you're right that '[:print:]' is locale-specific and the test will
probably not reject characters that appear in a particular locale.
Although I have sometimes switched locales on my computer for language
practice, this problem doesn't bother me much because the official
releases are always built on en_US.UTF-8 (and they will include the
ALDI rule, which is not a homoglyph attack -- and none of the characters
in [:print:] in a German locale are homoglyphs of each other).
The thing that we're concerned with preventing is
<rule from="^http://(www\.)?paypal\.com/" to="https://www.pаypаl.com/"/>
The vowels in the destination there are U+0430 CYRILLIC SMALL LETTER A,
which looks like U+0061 LATIN SMALL LETTER A (in many fonts). Or
<rule from="^http://(www\.)?paypal\.com/" to="https://www.pаyраl.com/"/>
which is the same but with the second "p" replaced with U+0400 CYRILLIC
SMALL LETTER ER.
If someone wants to propose a test that will detect this situation better
than the tr thing, I'd be happy to use that test instead of what we have
now!
--
Seth Schoen
Senior Staff Technologist schoen at eff.org
Electronic Frontier Foundation https://www.eff.org/
454 Shotwell Street, San Francisco, CA 94110 +1 415 436 9333 x107
More information about the HTTPS-everywhere
mailing list