Closed Bug 572667 Opened 15 years ago Closed 15 years ago

Remove the Accept-Language header from HTTP requests and the accompanying UI from prefs

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: hsivonen, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intl, privacy)

Steps to reproduce: 1) Load http://www.delorie.com:81/some/url.txt Actual results: An Accept-Language header is sent. The default value of the header depends on the UI language pack in use but the value can be configured via the preference UI. This means that configuration-based entropy is exposed and can by used for fingerprinting. See https://panopticlick.eff.org/ The utility of the Accept-Language header is negligible. This feature of HTTP has been evangelized for years and years, yet the feature has failed on the market. It's rare for a Web site to make its content available in multiple languages in such equal quality that the choice of language can be automated and the user doesn't need to choose the language on a case-by-case basis. Even sites like the official site of the European Union, which puts a great deal of money and effort into translation makes the language selection part of the site UI. Google guesses the user's language from the IP address and ignores Accept-Language. Thus, the feature provides no benefit or a very rare benefit to the users of Firefox while making Firefox instances more fingerprintable. Expected results: Expected the Accept-Language header not to be sent and expected there to be no UI for configuring preferred content languages.
Oh, and one reason why this feature of HTTP isn't used much in the wild is that it's bad for the search engine indexability of the site for distinct translations not to have distinct URLs.
Hey, I'm using that header ! Guessing a language from an ip-address is utterly wrong. Even if you ignore multi-lingual countries like Belgium and Canada, you will annoy every tourist that doesn't understand the local language. I do NOT want to receive Finnish text, just because I happen to be in Finland, right ? Neither do I want to receive English as a default. Ok, not enough websites are using the Accept-Language header. But this privacy thing (removing various info from User-Agent) is going too far. Ever heard of the expression "Throwing the baby out with the bath water" ? And about that EU-website : I used to work for their conference center, a long time ago. The internal website were really using that header to distinguish along languages (16 different ones back then), and as far as I know, they still do. I don't know why the external website isn't using them though.
I disagree with essentially everything Henri said in comment 0. The utility is not at all negligible. It is used in the wild, including on almost all major Mozilla sites. When a user visits AMO, SUMO, mozilla.com, they are redirected to a localized version based on their Accept-Language header. For example, if I visit http://support.mozilla.com/, I get redirected to http://support.mozilla.com/en-US/. If I do so in a Spanish version of Firefox, I get redirected to http://support.mozilla.com/es/. (Explicit locale selection is possible later after the redirect.) Django, a popular framework that AMO, SUMO, and many other sites use, has the same behavior. GeoIP is not a sufficient replacement. Just because a user happens to be in Germany does not mean they are using a German version of Firefox or are a German speaker. Removing Accept-Language breaks the web.
Ye, this feature not as popular as it should. But it used! The problem of the unique url solves very simple - by redirection from the main url. And it used in intranets, where index-uniqueness of UI is irrelevant. Please, don't remove a good thing just because you or your pals not use it. People who are paranoid enough can remove that property manualy. And. do you really think that default values of this settings real identified somebody? I think people divides into 2 part: 1st, who aware this feature, and can to choose between usability and privacy (acepted language? Oh, how much leakage!), and 2nd - who not aware this feature, and use default settings, that make them unindefieble between all 2nd type of users. Please, don't remove this! If google can know, how much hate I produce eery time when I open they's page with not my language just because of IP...
(In reply to comment #0) > Google guesses the user's language from the IP address and ignores > Accept-Language. Are you sure about that? Changing the preferred language in Options and loading http://www.google.com/ changes the language on that page for me.
If there is no Google cookie, then it will use Accept-Language as the default language. It used to be different, but it won't have worked in countries like Belgium. Several years ago, I had exchanged emails with people from Google that were testing this (as there were too many complaints - the public bug is in Mozilla's bugzilla somewhere). For a while it was only available in countries like Belgium, but I think that it now works globally (at least in all the countries that I visited the last couple of years). Note that I'm not against removing various fields from User-Agent when they don't present much value, or that are really revealing too much about the user (OS-info for instance). But not a value that the user can set and change themselves. Besides, why is it bad for the privacy that a website can discover your preferred language ? I've set the preference because I WANT the content in another language !
(In reply to comment #5) > (In reply to comment #0) > > > Google guesses the user's language from the IP address and ignores > > Accept-Language. > > Are you sure about that? Changing the preferred language in Options and loading > http://www.google.com/ changes the language on that page for me. For example, I see an interface of a blogger diaries only on languege of my residental country. I see no way to change it.
Facebook.com, Livejournal.com, Youtube.com utilize Accept-Language to serve localized UI. I think they alone prove that Accept-Language is far from being "market failure".
(In reply to comment #2) > Guessing a language from an ip-address is utterly wrong. That's my point! It should be logically obvious that using an IP-to-geography mapping to guess the language is utterly wrong but, yet, that's what Google does, so just using Accept-Language as specced isn't working for Google. (Closer experimentation shows that they actually do pay attention to *some* Accept-Language values even if they ignore English. Preferring Swedish over English turns google.fi into Swedish.) (In reply to comment #3) > It is used in the wild, including on almost all major Mozilla sites. I think we shouldn't pay too much attention to Mozilla's own sites when balancing fingerprintability and the exposure of information that also has legitimate uses, because Mozilla's sites make use of the information Firefox exposes to an abnormal degree. (Though, granted, language is more general than e.g. point release.) > When a user visits AMO, SUMO, mozilla.com, they are redirected to a localized > version based on their Accept-Language header. For example, if I visit > http://support.mozilla.com/, I get redirected to > http://support.mozilla.com/en-US/. If I do so in a Spanish version of Firefox, > I get redirected to http://support.mozilla.com/es/. (Explicit locale selection > is possible later after the redirect.) OK. That shows that the exposure of the preferred content language is made use of, but it also shows that the design of the Accept-Language feature is a failure, because that's *not* how Accept-Language is supposed to be used per spec. (I'm mentioning this only to make the point that I wasn't completely crazy to say that the HTTP feature has failed. Whether the data is being used as designed per spec has no relevance to fingerprintability.) > Removing Accept-Language breaks the web. An alternative, though a much lesser win for privacy, would be making it super-easy to expose the language defaults of the en-US version to the Web even when using a localized UI. This way, readers of other languages could opt to use the UI in their own language without having to expose as much uniqueness for fingerprinting as using a localized build currently exposes (via the UA string and Accept-Charset). It's already possible to configure Accept-Language to be equivalent to the en-US default, but to do so, one needs to 1) know about the issue and 2) find the pref but not under "Privacy"! and 3) also have an en-US build for comparison as opposed to just checking a box in the Privacy prefs. (In reply to comment #8) > Facebook.com, Livejournal.com, Youtube.com utilize Accept-Language to serve > localized UI. I think this is a much more persuasive data point than Mozilla's own sites using the data.
(In reply to comment #6) > Besides, why is it bad for the privacy that a website can discover your > preferred language ? I've set the preference because I WANT the content in > another language ! Revealing the language per se isn't bad for privacy. The problem is that exposing *anything* that varies from user to user but stays relatively constant for a given user can be used for fingerprinting, which enables tracking the user's activities on the Web. See the reference given in the bug description: https://panopticlick.eff.org/
That the feature has failed on the market is just plain wrong, there are enough sites (not only Mozilla.org sites) that use this feature like for example openstreetmap.org
Preferred content languages are used by SVG for the <switch systemLanguage="??"> matching. Here's an example from the SVG testsuite: http://www.w3.org/Graphics/SVG/Test/20061213/htmlObjectHarness/full-struct-cond-02-t.html We determine what text to display based on the order of your UI language prefs.
(In reply to comment #10) > Revealing the language per se isn't bad for privacy. The problem is that > exposing *anything* that varies from user to user but stays relatively constant > for a given user can be used for fingerprinting Is it possible to keep the feature, but add some randomness to the quality factors included in the string (keeping the relative order of course)?
(In reply to comment #13) > Is it possible to keep the feature, but add some randomness to the quality > factors included in the string (keeping the relative order of course)? That wouldn't be worthwhile. For a competent fingerprinter, the amount of signal (which preferred languages and in which order) would be the same.
Then, what about adding a few randomly selected languages at the end of the string help? Both the list and the order would be random.
(In reply to comment #15) > Then, what about adding a few randomly selected languages at the end of the > string help? Both the list and the order would be random. That doesn't help, either, as long as the fingerprinter knows that Firefox is doing that (and Firefox is well-known enough that competent fingerprinters would know). If the fingerprinter knows what part comes from configuration and what part is random, the fingerprinter can ignore the random part. In the simplest case, the fingerprinter could ignore everything but the first language, since if you start randomizing the first language the whole feature would be broken.
This is the only http-fingerprint bug I don’t want to see implemented as I’ve used this header for projects before. Comment #0 needs more data to back up the claim ‘very rare benefit’ How are websites going to know viewer language without this header, you can’t be suggesting every website use IP-lookups or default to English and show flags? eugh Seems to be a case for Mozilla evangelism instead.
(In reply to comment #17) > Comment #0 needs more data to back up the claim ‘very rare benefit’ Most sites on the Web are single-language and many sites that are multilingual identify the user from login anyway and could store a language perf in the login data (which could be initialized from an OpenID profile). > How are websites going to know viewer language without this header, you can’t > be suggesting every website use IP-lookups or default to English and show > flags? Without the header, sites wouldn't know the viewer language without the user taking action to choose a language in site-provided UI. Multilingual sites that require login (Facebook) could ask the user to pick a language once and store it in the login profile. Sites that don't require login would have to offer links to navigate between language versions. That said, it seems that this one isn't going to fly, so it would probably be more productive to morph this into a bug about making it easy (as in checking one checkbox) to make Firefox behave like the en-US version towards sites while showing the localized browser chrome to the user.
(In reply to comment #3) > Removing Accept-Language breaks the web. I'd go further than saying that. On occasion it's good to "break the web" if you have to fix something. The breakage will get fixed. On the other hand, this is the sort of change that would "damage the web" without replacing the removed ability. A 100% un-fingerprintable browser is a windmill quest. It's not realistic. The goal is to get to any possible fingerprint down to the accuracy where it's no longer really useful. This feature is used, and heavily by Mozilla, so this is almost certainly a WONTFIX. (please don't morph long discussions like this; make another bug for alternate ideas if needed)
Most people don't scrape the skin off of the tips of their fingers. Know why? Because the ridges that just happen to make up a fingerprint are actually very useful (and necessary) in order to grip things. I don't see any reason to remove Accept-Language from the browser. In fact, I don't recall ever asking Mozilla to prevent my browser from being fingerprinted in the first place. It seems like there has lately been a string of unannounced proposals (existing only in the form of bug reports) that serve simply to take an extreme reductionist viewpoint. And it seems that language-related features repeatedly wind up as a casualty of this. (I'm thinking, of course, of the removal of the Properties dialog box in bug 513147 that suddenly rendered moot all of my work on bug 356038, and caused bug 522913 to be marked as WONTFIX.)
(In reply to comment #5) > > Google guesses the user's language from the IP address and ignores > > Accept-Language. > Are you sure about that? Changing the preferred language in Options and loading > http://www.google.com/ changes the language on that page for me. A site may use the IP origin as a fallback to determine the language, but it should prefer the header provided (which, btw., all other browsers listed in bug 572650 send with their HTTP headings, so this reduction would be a novum). There are valid use cases for i18n purposes, as was pointed out before. I'm wondering why a geolocation-enable setting of "true" is apparently accepted whereas the browser language is supposed to be a privacy risk? I don't get that logic. Also, as it's a preference and even has a UI, users who choose to not broadcast language information can easily do that by emptying the box.
Keywords: intl
I call for this to be WONTFIX. This has always been the preferred and only reliable way of showing websites in the language the user actually prefers, and we should not prevent making the user happy.
Yeah, we should WONTFIX this. Accept-Language is a useful web feature, reducing fingerprintability cannot be the only priority. (In reply to comment #18) > That said, it seems that this one isn't going to fly, so it would probably be > more productive to morph this into a bug about making it easy (as in checking > one checkbox) to make Firefox behave like the en-US version towards sites while > showing the localized browser chrome to the user. This sounds like add-on fodder. We have a reasonably simple and not too deeply buried UI for adjusting the preferred content languages.
I'll go ahead and mark this WONTFIX. I'm not a module peer, so anyone feeling strongly enough about it should feel free to reopen. Also feel free to file a bug for the suggestion in the last paragraph of comment 18, although I personally don't think it makes sense.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
(In reply to Henri Sivonen (:hsivonen) from comment #0) > Google guesses the user's language from the IP address > and ignores Accept-Language. This is exactly the thing not to do. The good practice is to do the opposite. Anyway, Google is notoriously bad at talking in the user's wanted language. > Expected results: > Expected the Accept-Language header not to be sent and expected there to be > no UI for configuring preferred content languages. If I understand you correctly, Firefox should give up respecting the standard because many others fail to respect the standard. I think Firefox' mission is to do the opposite — especially when the standard is good. Regarding privacy, this would be disproportionate feature scrapping. And it would not remove the privacy "issue", it would just move it. The user would click her wanted language in the site, so the site would know anyway that the user prefers this language. And I don't know any user thinking "I am disclosing a secret" when choosing a language in a site. (In reply to Henri Sivonen (:hsivonen) from comment #18) > That said, it seems that this one isn't going to fly, so it would probably > be more productive to morph this into a bug about making it easy (as in > checking one checkbox) to make Firefox behave like the en-US version towards > sites while showing the localized browser chrome to the user. Why this one ? Who said "en-US" should be the default language of the world ?
Uhm, agreed to everything, but this proposal was marked as "won't fix" already almost three years ago. Thus, there isn't quite a point in arguing further against it as a decision has been reached.
You need to log in before you can comment on or make changes to this bug.