Measuring Bias in “Organic” Web Search with Benjamin Lockwood

By comparing results between leading search engines, we identify patterns in their algorithmic search listings. We find that each search engine favors its own services in that each search engine links to its own services more often than other search engines do so. But some search engines promote their own services significantly more than others. We examine patterns in these differences, and we flag keywords where the problem is particularly widespread.

Even excluding “rich results” (whereby search engines feature their own images, videos, maps, etc.), we find that Google’s algorithmic search results link to Google’s own services more than three times as often as other search engines link to Google’s services.

For selected keywords, biased results advance search engines’ interests at users’ expense: We demonstrate that lower-ranked listings for other sites sometimes manage to obtain more clicks than Google and Yahoo’s own-site listings, even when Google and Yahoo put their own links first.

Details, including methodology, analysis, and policy implications:

Measuring Bias in “Organic” Web Search

Bias in Search Results?: Diagnosis and Response

Edelman, Benjamin. “Bias in Search Results?: Diagnosis and Response.” Indian Journal of Law and Technology 7 (2011): 16-32.

I explore allegations of search engine bias, including understanding a search engine’s incentives to bias results, identifying possible forms of bias, and evaluating methods of verifying whether bias in fact occurs. I then consider possible legal and policy responses, and I assess search engines’ likely defenses. I conclude that regulatory intervention is justified in light of the importance of search engines in referring users to all manner of other sites, and in light of striking market concentration among search engines.

Knowing Certain Trademark Ads Were Confusing, Google Sold Them Anyway — for $100+ Million

Disclosure: I serve as a consultant to various companies that compete with Google. But I write on my own — not at the suggestion or request of any client, without approval or payment from any client.

When a user enters a search term that matches a company’s trademark, Google often shows results for the company’s competitors. To take a specific example: Searches for language software seller "Rosetta Stone" often yield links to competing sites — sometimes, sites that sell counterfeit software. Rosetta Stone think that’s rotten, and, as I’ve previously written, I agree: It’s a pure power-play, effectively compelling advertisers to pay Google if they want to reach users already trying to reach their sites; otherwise, Google will link to competitors instead. Furthermore, Google is reaping where others have sown: After an advertiser builds a brand (often by advertising in other media), Google lets competitors skim off that traffic — reducing the advertiser’s incentive to invest in the first place. So Google’s approach to trademarks definitely harms advertisers and trademark-holders. But it’s also confusing to consumers. How do we know? Because Google’s own documents admit as much.

Today Public Citizen posted an unredacted version of Rosetta Stone’s appellate brief in its ongoing litigation with Google. Google had sought to keep confidential the documents that ground district court and appellate adjudication of the dispute, but now some of the documents are available — giving an inside look at Google’s policies and objectives for trademark-triggered ads. Some highlights:

  • Through early 2004, Google let trademark holders request that ads be disabled if they used a trademark in keyword or ad text. But in early 2004, Google determined that it could achieve a "significant potential revenue impact" from selling trademarks as keywords. (ref)
  • In connection with Google’s 2004 policy change letting advertisers buy trademarks as keywords, Google conducted experiments to assess user confusion from trademarks appearing in search advertisements. Google concluded that showing a trademark anywhere in the text of an advertisement resulted in a "high" degree of consumer confusion. Google’s study concluded: "Overall very high rate of trademark confusion (30-40% on average per user) … 94% of users were confused at least once during the study." (ref)
  • Notwithstanding Google’s 2004 study, Google in 2009 changed its trademark policy to permit the user of trademarks in advertisement text. Google estimated that this policy change would result in at least $100 million of additional annual revenue, and potentially more than a billion dollars of additional annual revenue. Google implemented this change without any further studies or experiments as to consumer confusion. (ref)
  • Google possesses more than 100,000 pages of complaints from trademark holders, including at least 9,862 complaints from at least 5,024 trademark owners from 2004 to 2009. (ref)

Kudos to Public Citizen for obtaining these documents. That said, I believe Google should never have sought to limit distribution of these documents in the first place. In other litigation, I’ve found that Google’s standard practice is to attempt to seal all documents, even where applicable court rules require that documents be provided to the general public. That’s troubling, and that needs to change.

Hard-Coding Bias in Google “Algorithmic” Search Results

I present categories of searches for which available evidence indicates Google has “hard-coded” its own links to appear at the top of algorithmic search results, and I offer a methodology for detecting certain kinds of tampering by comparing Google results for similar searches. I compare Google’s hard-coded results with Google’s public statements and promises, including a dozen denials but at least one admission. I conclude by analyzing the impact of Google’s tampering on users and competition, and by proposing principles to block Google’s bias.

Details, including screenshots, methodology, proposed regulatory response, and analogues in other industries:

Hard-Coding Bias in Google “Algorithmic” Search Results

A Closer Look at Google’s Advertisement Labels

Google's tiny 'Ads' labelGoogle’s tiny ‘Ads’ label

The FTC has called for “clear and conspicuous disclosures” in advertisement labels at search engines, and the FTC specifically emphasized the need for “terms and a format that are easy for consumers to understand.” Unfortunately, Google’s new advertisement labels fail this test: Google’s “Ads” label is the smallest text on the page, far too easily overlooked. (Indeed, as I show in the image at left, the “Ads” label substantially fits within an “o” in “Google.”) Meanwhile, Google now merges algorithmic and advertisement results merged within a single set of listings; Google’s “Help” explanations are inaccurate; and Google uses inconsistent labels mere inches apart within search results, as well as across services.

Details, including the shortfalls, screenshots, comparisons, and proposed alternatives:

A Closer Look at Google’s Advertisement Labels

.

Labels and Disclosures in Search Advertising with Duncan Gilchrist

Disclosure: I serve as a consultant to various companies that compete with Google. But I write on my own — not at the suggestion or request of any client, without approval or payment from any client.

Search engines have long labeled their advertisements with labels like “Sponsored links”, “Sponsored results”, and “Sponsored sites.” Do users actually know that these labels are intended to convey that the listings are paid advertisements? In a draft paper we’re posting today, Duncan Gilchrist and I try to find out.

“Sponsored Links” or “Advertisements”?: Measuring Labeling Alternatives in Internet Search Engines

In an online experiment, we measure users’ interactions with search engines, both in standard configurations and in modified versions with improved labels identifying search engine advertisements. In particular, for a random subset of users, we change “sponsored link” labels to instead read “paid advertisement.” We find that users receiving the “paid advertisement” label click 25% to 33% fewer advertisements and correctly report that they click fewer advertisements, controlling for the number of advertisements they actually click. Results are most pronounced for commercial searches, and for users with low income, low education, and little online experience.

We consider our findings particularly timely in light of Google’s change, just last week, to label many of its advertisements as “Ads.” On one view, “Ads”” is an improvement – probably easier for unsophisticated consumers to understand. Yet it’s a strikingly tiny label – the smallest text anywhere in Google’s search results, and about a quarter as many pixels as the corresponding disclosure on other search engines. As our paper points out, FTC litigation has systematically sought the label “Paid Advertisement, and we still think that’s the better choice.

Tying Google Affiliate Network

Disclosure: I serve as co-counsel in unrelated litigation against Google, Vulcan Golf et al. v. Google et al. I also serve as a consultant to various companies that compete with Google. But I write on my own — not at the suggestion or request of any client, without approval or payment from any client.

In one of the few areas of Internet advertising where Google is not dominant – where just three years ago Google had no offering at all – Google now uses tying to climb towards a position of dominance. In particular, using its control over web search, Google offers preferred search ad placement and superior search ad terms to the advertisers who agree to use Google Affiliate Network. Competing affiliate networks cannot match these benefits, and Google’s bundling strategy threatens to grant Google a position of power in yet another online advertising market.

Google shows algorithmic search results at the left side of users’ screens, while Google’s “AdWords” ads appear at the right and, often, top. Historically, Google has sold search ads on a cost-per-click basis: An advertiser is charged each time a user clicks its ad. With these offerings, Google has grown to a position of dominance in search and in search advertising — 77% share of U.S. web search in the US, with even higher levels in other countries.

While Google dominates online search, Google to date has made less headway in the area of affiliate marketing, an approach to online advertising wherein small to midsized sites (“affiliates”) receive payments paid if users click links and make purchases from the corresponding merchants. For example, Gap pays a 2% to 4% commission if a user clicks an affiliate link to Gap and goes on to make a purchase. While almost all of the web’s largest merchants run affiliate programs, as of the start of 2007 Google offered no affiliate marketing services. Only through its mid-2007 acquisition of DoubleClick did Google obtain an affiliate marketing program, then called Performics and now renamed Google Affiliate Network (GAN). But Google’s affiliate network began in third place in the US market — behind larger competitors Commission Junction and LinkShare.

Google now grants GAN advertisers preferred placement in search results. Notice that the three GAN ads appear with images, whereas ordinary AdWords ads show only text. And Google places all GAN image ads at the top of the right rail -- above all right-side AdWords ads. Beginning in November 2009, Google’s Product Listing Ads service gave GAN major advantages over competing affiliate networks. Within search ads, Google now includes listings not just to Google’s AdWords pay-per-click advertisers, but also to GAN advertisers. Through these placements, Google offers GAN advertisers four striking and valuable benefits:

  • Image ads. AdWords advertisements show only text. But GAN advertisements include an image — making GAN offers stand out in search results. See the three image ads highlighted in red in the screenshot at right.
  • Preferred placement. AdWords advertisements are ordered, Google says, based on how much each advertiser bids as well as Google’s assessment of ad relevance, click-through rate, and other factors known only to Google. But in my testing, all GAN ads appear at the top of the “right rail” of side listings — prominent, highly visible screen space that gets more attention than any AdWords listings below. Indeed, by pushing AdWords ads further down the page, GAN ads reduce the value of the AdWords slots. In the screenshot at right, notice that all three GAN image ads appear above all the right-rail AdWords ads.
  • Conversion-contingent payment. AdWords advertisers continue to pay on a per-click basis, incurring costs as soon as a user clicks a link. In contrast, GAN advertisers only have to pay if a user clicks a link and purchases a product.
  • Preferred payment terms. Because AdWords advertisers pay as soon as a user clicks, they must pay for users’ clicks even if servers malfunction, even if credit card processors reject users’ charges, and even if users return their orders or initiate chargebacks. In contrast, in all these circumstances, GAN advertisers incur no advertising costs at all.

I expect Google will argue that it is within its rights to package, bundle, and tie its products as it sees fit. I disagree. Here, Google ties its search offering to its affiliate network without an apparent pro-competitive purpose but with obvious anti-competitive effects. In particular, tying affiliate network services to preferred search ad format and placement gives GAN an advantage over competing affiliate networks, without efficiencies or other countervailing benefits to users or advertisers.

Furthermore, there is no plausible justification for providing image ads only to GAN advertisers or for granting all GAN ads positions above all right-side AdWords ads. To the contrary, Google could easily allow all AdWords ads to include images, and Google could instead intersperse GAN ads (and ads from other affiliate networks) among AdWords advertisements in whatever order auctions and algorithms fairly deem optimal. Those would be the natural product design decisions if Google genuinely sought to include images wherever useful and if Google genuinely sought to include affiliate ads whenever relevant. Because Google instead reserves these benefits for GAN advertisers, the natural inference is that Google reserves special rewards for advertisers choosing GAN — benefits that come at the expense of genuine competition in affiliate marketing services.

In the remainder of this piece, I discuss why the public should be concerned about Google’s tying tactics, then assess Google’s tying-based promotion of its various other products. I conclude with brief policy prescriptions.

Cause for Concern

I see four major reasons for concern in Google’s decision to tie GAN to preferred placements, format, and terms in sponsored search.

First, GAN’s tying threatens to extend Google’s dominance into yet another facet of online advertising. Google’s dominance in search and search advertising is well-known. But affiliate marketing is a rare area where, until recently, Google had little or no presence. By leveraging its dominance in search to take over yet another type of online advertising, Google will importantly limit advertisers’ options. Today, advertisers unhappy with Google’s AdWords prices or rules can consider working with independent web sites through affiliate programs not operated by Google. But if Google comes to dominate affiliate marketing, then even affiliate marketing will become unavailable to advertisers dissatisfied with Google. Indeed, knowing that it dominates multiple aspects of online advertising, Google will be in a position to raise prices that much further.

Second, GAN’s tying harms those AdWords advertisers who refuse GAN and buy only pay-per-click ads from Google. The more GAN ads Google puts above ordinary AdWords listings, the less visible AdWords advertisers become. AdWords advertisers are at a further disadvantage when Google gives image ads to GAN advertisers but not AdWords advertisers, and when Google offers preferred terms (e.g. refunds of advertising costs if a user returns a product) to GAN advertisers but not AdWords advertisers. Google promises that “the highest ranked ad is displayed in the most prominent position,” but when Google gives GAN ads the top positions, ordinary AdWords advertisers are left bidding on the leftovers. And as Google makes its left-side listings increasingly visual — inline maps, images, product pictures, video thumbnails, and more — advertisers need images to capture users’ attention. So AdWords-only advertisers, without image-based ads, end up at a significant disadvantage.

Third, for nearly a year Google has offered the Product Listing Ads benefits in “limited beta” available only to “a small number of participants” Google selects. In fact I’ve seen numerous advertisers, large and small, promoted in Product Listing Ads. But it is striking to see Google offer preferred listings only to those advertisers Google chooses to favor. Elsewhere Google argues that its auction-based ad sales are “equitable.” But when Google gives superior placement to its preferred advertisers, for nearly a year, Google’s rules seem the opposite of fair.

Finally, GAN’s tying is particularly worrisome in the context of other Google tactics. As detailed in the next section, Google uses and has used bundling and tying to enter and dominate numerous markets. If these tactics continue unchecked, we face a future where Google’s dominance stretches even further.

Google’s Tying Strategy More Broadly

Tying GAN to search is just one example of Google’s oft-repeated tactic of forcing customers who want one Google service to accept additional Google services too. This section presents a series of such examples.

Throughout, these tying examples fit the following form:

A [user type] who wants [desirable Google service] must also accept [unwanted Google service].

I now turn to specifics.

Tying to promote affiliate marketing services: An advertiser who wants top placement in Google search advertisements, image ads, and preferred payment terms must join Google Affiliate Network.

Details: See above.

Tying to promote low-quality syndicated search marketing services: An advertiser who wants placement through high-quality Google Search Network sites must also accept low-quality Google Search Network placements.

Details: Google’s Search Network includes some top-quality publishers such as AOL Search and New York Times. But if an advertiser contracts to advertise through Google Search Network, Google demands permission to also place the advertiser’s ads on whatever other sites Google selects, in whatever quantity Google chooses. Many of these placements are low-quality or worthless, including spyware popups, typosquatting sites, and deceptive toolbars. Many of these placements trick advertisers into believing they are receiving valuable traffic when in fact the traffic consists of users the advertisers had already reached or would receive anyway. Even if an advertiser learns about these problems, the advertiser must continue to pay for this traffic, on pain of losing access to Google’s high-quality search partners.

Tying to promote vertical search: A user who wants Google’s core algorithmic search results must also accept Google’s own vertical search results.

Details: Users relish Google’s highly-regarded algorithmic search results. But a user running search at Google also receives Google’s vertical search services: Whether the user prefers Bing Maps, Google Maps, Mapquest, or Yahoo Maps, Google Search always presents inline maps from Google, and so too for images, local businesses, products, scholarly articles, videos, and more. On one view, these vertical search services are an integral part of Google’s offering, but scores of competing vendors reflect a competing vision of users choosing core algorithmic search separately from vertical search services. By granting its special-purpose search services preferred placement, Google sharply reduces traffic to competing vertical search services.

Tying to promote ancillary mobile services : A mobile phone developer who wants Google’s Android certification and access to Android Market application store must also accept Google’s ancillary services, including geolocation.

Details: In a September 2010 complaint, Skyhook alleges that Google ordered Motorola not to ship a proposed device that would have included both Google Location Service and Skyhook’s XPS service, two distinct methods to determine a user’s geographic location. Skyhook claims that Google grounded its threat in Google’s Android Market application store: If Motorola shipped a device with software Google did not approve, Google would ban users of that device from accessing Android Market or running the apps available there. By requiring that Motorola omit Skyhook’s service in order to give users access to Google Market, Google denied users access to Skyhook.

Policy Prescriptions

Advertisers, consumers, policy-makers and the concerned public should give tying relationships a careful look. In principle, bundling previously-separate offerings can offer useful synergies and efficiencies. But bundling can also let a company expand from strength in one area into dominating numerous additional fields — limiting choice, raising prices, and reducing innovation.

In some instances, it may not be obvious how to separate bundled products. For example, there is currently no single clear mechanism whereby Google search results could embed maps, product feeds, or other structured or interactive information from other search services. Pending a compelling plan to unbundle vertical results from core search, my instinct is to save this problem for later — albeit perhaps requiring disclosure of favored treatment Google gives its own search services, or limiting the permissible extent of such favored treatment.

In other instances, market structure and product design yield a natural vision of products that could be separate, generally are separate, and should rightly remain separate. To my eye, these principles ring particularly true in the separation between search marketing and affiliate marketing. There is no logical reason why GAN advertisers should enjoy the only listings with images. Nor is there any logical reason why all GAN ads should appear above all right-side AdWords ads. When Google grants its GAN advertisers these special benefits, the best conclusion is that Google is using its dominance in search to establish dominance in affiliate marketing — seizing an unearned advantage over competing affiliate marketing services. These exclusionary tactics are unjustified and improper, and they ought not be permitted.

Google’s first step should be to cease tying Google Affiliate Network to preferred search placement, format, and terms: An advertiser seeking to include image ads should not have to sign up with GAN, nor should GAN ads arbitrarily appear above competitors. A recent post at Channel Dollars off-handedly reports that Product Listing Ads “has been taken out” GAN and “is being merged into” AdWords. That’s a fair start. But even temporary ties can impede competition, and Google has delivered these large benefits only to GAN advertisers for some ten months.

Meanwhile, Google’s preferred treatment of selected GAN advertisers foreshadows a worrisome future. If Google can give preferred treatment to advertisers who use GAN, what prevents preferred treatment of advertisers who support Google’s regulatory agenda, and inferior treatment of advertisers who complain to policy-makers? Indeed, I doubt that Google invited to Product Listing Ads any advertisers who have publicly criticized Google’s practices. Google’s ability to distribute valuable but opaque favors to preferred advertisers — and to withhold such favors from anyone Google dislikes — makes Google’s power that much stronger and, to my eye, that much more troubling.

Facebook Leaks Usernames, User IDs, and Personal Details to Advertisers updated May 26, 2010

Browse Facebook, and you wouldn’t expect Facebook’s advertisers to learn who you are. After all, Facebook’s privacy policy and blog posts promise not to share user data with advertisers except when users grant specific permission. For example, on April 6, 2010 Facebook’s Barry Schnitt promised: “We don’t share your information with advertisers unless you tell us to (e.g. to get a sample, hear more, or enter a contest). Any assertion to the contrary is false. Period.”

My findings are exactly the contrary: Merely clicking an advertiser’s ad reveals to the advertiser the user’s Facebook username or user ID. With default privacy settings, the advertiser can then see almost all of a user’s activity on Facebook, including name, photos, friends, and more.

In this article, I show examples of Facebook’s data leaks. I compare these leaks to Facebook’s privacy promises, and I point out that Facebook has been on notice of this problem for at least eight months. I conclude with specific suggestions for Facebook to fix this problem and prevent its reoccurrence.

Details of the Data Leak

Facebook’s data leak is straightforward: Consider a user who clicks a Facebook advertisement while viewing her own Facebook profile, or while viewing a page linked from her profile (e.g. a friend’s profile or a photo). Upon such a click, Facebook provides the advertiser with the user’s Facebook username or user ID.

Facebook leaks usernames and user IDs to advertisers because Facebook embeds usernames and user IDs in URLs which are passed to advertisers through the HTTP Referer header. For example, my Facebook profile URL is http://www.facebook.com/bedelman. Notice my username (yellow).

Of course, it would be incorrect to assume that a person looking at a given profile is in fact the owner of that profile. A request for a given profile might reflect that user looking at her own profile, but it might instead be some other user looking at the user’s profile. However, when a user views her own profile page, Facebook automatically embeds a “profile” tag (green) in the URL:

http://www.facebook.com/bedelman?ref=profile

Furthermore, when a user clicks from her profile page to another page, the resulting URL still bears the user’s own user ID or username, along with the details of the later-requested page. For example, when I view a friend’s profile, the resulting URL is as shown below. Notice the continued reference to my username (yellow) and the fact that this is indeed my profile (green), along with an appendage naming the user whose page I am now viewing (blue).

http://www.facebook.com/bedelman?ref=profile#!/pacoles

Each of these URLs is passed to advertisers whenever a user clicks an ad on Facebook. For example, when I clicked a Livingsocial ad on my own profile page, Facebook redirected me to the advertiser, yielding the following traffic to the advertiser’s server. Notice the transmission in the Referer header (red) of my username (yellow) and the fact that I was viewing my own profile page (green).

GET /deals/socialads_reflector?do_not_redirect=1&preferred_city=152&ref=AUTO_LOWE_Deals_ 1273608790_uniq_bt1_b100_oci123_gM_a21-99 HTTP/1.1
Accept: */*
Referer: http://www.facebook.com/bedelman?ref=profile

Host: livingsocial.com

The same transmission occurs when a user clicks from her profile page to a friend’s page. For example, I clicked through to a friend’s profile, http://www.facebook.com/bedelman?ref=profile#!/pacoles, where I clicked another Livingsocial ad. Again, Facebook’s redirect caused my browser to transmit in its Referer header (red) my username (yellow), the fact that that username reflects my personal profile (green). Interestingly, my friend’s username was omitted from the transmission because it occurred after a pound sign, causing it to be automatically removed from Referer transmission.

GET /deals/socialads_reflector?do_not_redirect=1&preferred_city=152&ref=AUTO_LOWE_Deals_ 1273608790_uniq_bt1_b100_oci123_gM_a21-99 HTTP/1.1
Accept: */*
Referer: http://www.facebook.com/bedelman?ref=profile

Host: livingsocial.com

In further testing, I confirmed that the same transmission occurs when a user clicks from her profile page to a photo page, or to any of various other pages linked form a user’s profile.

With a Facebook member’s username or user ID, current Facebook defaults allow an advertiser (and anyone else) to obtain a user’s name, gender, other profile data, picture, friends, networks, wall posts, photos, and likes. Furthermore, the advertiser already knows the user’s basic demographics, since the advertiser knows the user fits the profile the advertiser had requested from Facebook. For example, in grey highlighting above, the advertiser learned from Facebook my age, gender, and geographic location.

Facebook’s Contrary Statements about User Privacy vis-a-vis Advertisers

Facebook has made specific promises as to what information it will share with advertisers. For one, Facebook’s privacy policy promises “we do not share your information with advertisers without your consent” (section 5). Then, in section 7, Facebook lists eleven specific circumstances in which it may share information with others — but none of these circumstances applies to the transmission detailed above.

Facebook’s recent blog postings also deny that Facebook shares users’ identities with advertisers. In an April 6, 2010 post, Facebook promised: “We don’t share your information with advertisers unless you tell us to (e.g. to get a sample, hear more, or enter a contest). Any assertion to the contrary is false. Period.” Facebook’s prior postings were similar. July 1, 2009: “Facebook does not share personal information with advertisers except under the direction and control of a user. … You can feel confident that Facebook will not share your personal information with advertisers unless and until you want to share that information.” December 9, 2009: “Facebook never shares personal information with advertisers except under your direction and control.” As to all these claims, I disagree. Sharing a username or user ID upon a single click, without any disclosure or indication that such information will be shared, is not at a user’s direction and control.

Facebook Has Been on Notice of This Problem for Eight Months

AT&T Labs researcher Balachander Krishnamurthy and Worcester Polytechnic Instituteprofessor Craig Wills previously identified the general problem of social networks leaking user information to advertisers, including leakage through the Referer headers detailed above. In August 2009, their On the Leakage of Personally Identifiable Information Via Online Social Networks was posted to the web and presented at the Workshop on Online Social Networks (WOSN).

Through Krishnamurthy and Wills’ research, Facebook eight months ago received actual notice of the data leakage at issue. A September 2009 MediaPost article confirms Facebook’s knowledge through it spokesperson’s response. However, Facebook spokesperson Simon Axten severely understated the severity of the data leak: Axten commented “The average Facebook user views a number of different profile pages over the course of a session …. It’s thus difficult for a tracking website to know whether the identifier belongs to the person being tracked, or whether it instead belongs to a friend or someone else whose profile that person is viewing.” I emphatically disagree. As shown above, when a user views her own profile, or a page linked from her own profile, the “?ref=profile” tag is added to the URL — exactly confirming the identity of the profile owner.

What Facebook Should Do

Since receiving actual notice of these data leaks, Facebook has implemented scores of new features for advertising, monetization, information-sharing, and reorganization. Inexplicably, Facebook has failed to address leakage of user information to advertisers. That’s ill-advised and short-sighted: Users don’t expect ad clicks to reveal their names and details, and Facebook’s privacy policy and blog posts promise to honor that expectation. So Facebook needs to adjust its actual practices to meet its promises.

Preventing advertisers from receiving usernames and user IDs is strikingly straightforward: A modified redirect can mask referring URLs. Currently, Facebook uses a simple HTTP 301 redirect, which preserves referring URLs — exactly creating the problem detailed above. But a FORM POST redirect, META REFRESH redirect, or JavaScript redirect could conceal referring URLs — preventing advertisers from receiving username or user ID information.

Instead, Facebook has partially implemented the pound sign method described above — putting some, but not all, sensitive information after a pound sign, with the result that sometimes this information is not transmitted as a Referer. If fully implemented across the Facebook site, this approach might prevent the data leakage I uncovered. However, in my testing, numerous within-Facebook links bypass the pound sign masking. In any event, an improved redirect would be much simpler to implement — requiring only a single adjustment to the ad click-redirect script, rather than requiring changes to URL formats across the Facebook site.

Finally, Facebook should inform users of what has occurred. Facebook should apologize to users, explain why it didn’t live up to its explicit privacy commitments, and establish procedures — at least robust testing, if not full external review — to assure that users’ privacy is correctly protected in the future.

Update – May 26, 2010

On May 20, 2010, the Wall Street Journal reported the problem detailed above. On or about that same day, Facebook removed the ref=profile tags that were the crux of the data leak.

I yesterday spoke with Arturo Bejar, a Facebook engineer who investigated this problem. Arturo told me that after Krishnamurthy and Wills’ article, he reviewed relevant Facebook systems in search of leakage of user information. At that time, he found none, in that Facebook revealed the URLs users were browsing when they clicked ads, but did not indicate whether the user clicking a given ad was in fact the owner of the profile showing that ad. However, in a subsequent Facebook redesign, beginning in February 2010, Facebook user home pages received a new “profile” button which carried the ref=profile URL tags I analyze above. Because this tag was added without a further privacy review, Arturo tells me that he and others at Facebook did not consider the interaction between this tag and the problem I describe above. Arturo says that’s why this problem occurred despite the prior Krishnamurthy and Wills article.

Arturo also pointed out that the problem I describe did not affect advertisers whose landing pages were pages on Facebook (rather than advertisers’ own external sites).

Meanwhile, Facebook’s May 24 “Protecting Privacy with Referrers” presents Facebook’s view of the problem in greater detail. Facebook’s posting offers a fine analysis of the various methods of redirects and Facebook’s choice among them. It’s worth a read.

After discussing the problem with Arturo and reading Facebook’s new post, I reached a more favorable impression of Facebook’s response. But my view is tempered by Facebook’s ill-advised attempts to downplay the breach.

  • Rather than affirmatively describing the specific design flaw, Facebook’s post describes what “could” “potentially” occur. Facebook’s post never gives a clear affirmative statement of the problem.
  • Facebook says advertisers would need to “infer” a user’s username/ID. But usernames and IDs are sent directly, in clear and unambiguous URLs, hardly requiring complex analysis
  • Facebook claims that the breach affected only “one case … if a user takes a specific route on the site” (WSJ quote). Facebook also calls the problem “a rarely occurring case” (posting). I dispute these characterizations. It is hardly “rare” for a user to view her own profile. To view her own profile and click an ad? There’s no reason to think that’s any less frequent than clicking an ad elsewhere. To view her own profile, click through to another page, and then click an ad? That’s perfectly standard. Furthermore, although Facebook told the Journal there is “one case” in which data is leaked improperly, in fact I’ve found many such cases including clicking from profile to ad, from profile to friend’s page to ad, and from profile to photo page to ad, to name three.
  • Through transmission in HTTP Referer headers, usernames and IDs appears reach advertisers’ web servers in a manner such that default server log files would store this data indefinitely, and default analytics would tabulate it accordingly. Facebook says it has “no reason to believe that any advertisers were exploiting” the data breach I reported, but the fact is, this data ends up in a place where advertisers could (and, as to historic data, still can) access it easily, using standard tools, and at their convenience.
  • Although Facebook’s post says the problem is “potential,” I found that a user’s username/ID is sent with each and every click in the affected circumstances.

So the problem was substantial, real, and immediate. Facebook errs in suggesting the contrary.

Google Inc. (teaching materials) with Thomas Eisenmann

Edelman, Benjamin, and Thomas R. Eisenmann. “Google Inc.” Harvard Business School Case 910-036, January 2010. (Revised April 2011.) (Winner of ECCH 2011 Award for Outstanding Contribution to the Case Method – Strategy and General Management.) (educator access at HBP.)

Describes Google’s history, business model, governance structure, corporate culture, and processes for managing innovation. Reviews Google’s recent strategic initiatives and the threats they pose to Yahoo, Microsoft, and others. Asks what Google should do next. One option is to stay focused on the company’s core competence, i.e., developing superior search solutions and monetizing them through targeted advertising. Another option is to branch into new arenas, for example, build Google into a portal like Yahoo or MSN; extend Google’s role in e-commerce beyond search, to encompass a more active role as an intermediary (like eBay) facilitating transactions; or challenge Microsoft’s position on the PC desktop by developing software to compete with Office and Windows.

Supplements:

Google Inc. (Abridged) – Case (HBP 910032)

Teaching Materials:

Google inc. and Google Inc. (Abridged) – Teaching Note (HBP 910050)