[PrivacyBadger] tracking CGI args

Greg Lindahl lindahl at pbm.com
Tue Mar 22 12:36:59 PDT 2016


For a long while I've been annoyed by tracking CGI args, like the
Urchin/Google Analytics utm_* args. Like cookies, sometimes they have
long, unique-looking values, other times they have short values.

There are a few browser plugins in this area:

https://addons.mozilla.org/en-US/firefox/addon/au-revoir-utm/?src=cb-dl-toprated
https://chrome.google.com/webstore/detail/tracking-token-stripper/kcpnkledgcbobhkgimpbmejgockkplob?hl=en

but they are driven by static lists (like utm_*), which is not as
flexible and comprehensive as the Privacy Badger approach.

I recently started working at the Internet Archive, where our crawler
and Wayback Machine playback are both needlessly confused by these CGI
args, in both the long, unique form & the short form. Our crawler has
the ability to discover some of these tracking args by noticing that
multiple urls, differing only in their cgi args, deliver pages with
the same hash.

If you're interested at all, I could imagine several ways you might
want to proceed. I'm happy to provide data from IA's crawler, and I'd
also like to consume any data that you folks generate.

-- greg



More information about the PrivacyBadger mailing list