Monday, September 19, 2011

Quick links

Some of what has caught my attention recently:
  • "60 percent of Netflix views are a result of Netflix's personalized recommendations" and "35 percent of [Amazon] product sales result from recommendations" ([1] [2])

  • When doing personalization and recommendations, implicit ratings (like clicks or purchases) are much less work and turn out to be highly correlated to what people would say their preferences are if you did ask ([1])

  • Good defaults are important. 95% won't change the default configuration even in cases where they clearly should. ([1])

  • MSR says 68% of mobile local searches occur while people are actually in motion, usually in a car or bus. Most are looking for the place they want to go, usually a restaurant. ([1])

  • Google paper on Tenzing, a SQL layer on top of MapReduce that appears similar in functionality to Microsoft's Scope or Michael Stonebraker's Vertica. Most interesting part is the performance optimizations. ([1])

  • Googler Luiz Barroso talks data centers, including giving no love to using flash storage and talking about upcoming networking tech that might change the game. ([1] [2])

  • High quality workers on MTurk are much cheaper than they should be ([1])

  • Most newspapers should focus on being the definitive source for local news and the primary channel to get to small local advertisers ([1] [2])

  • Text messaging charges are unsustainable. Only question is when and how they break. ([1])

  • "If you want to create an educational game focus on building a great game in the first place and then add your educational content to it. If the game does not make me want to come back and play another round to beat my high-score or crack the riddle, your educational content can be as brilliant as it can be. No one will care." ([1])

  • A few claims that it is not competitor's failures, but Apple's skillful dominance of supply chains, that prevents Apple's competitors from successfully copying Apple products. I'm not convinced, but worth reading nonetheless. ([1] [2] [3])

  • Surprising amount of detail about the current state of Amazon's supply chain in some theses out of MIT. Long reads, but good reads. ([1])

  • If you want to do e-commerce in a place like India, you have to build out your own delivery service. ([1])

  • Like desktop search in 2005, Dropbox and other cloud storage products exist because Microsoft's product is broken. Microsoft made desktop search go away in 2006 by launching desktop search that works, and it will make the cloud storage opportunity go away by launching a cloud drive that works. ([1] [2] [3])

  • Just like in 2005, merging two failing businesses doesn't make a working business. Getting AOL all over you isn't going to fix you, Yahoo. ([1] [2])

  • Good rant on how noreply@ e-mail addresses are bad customer service. And then the opposite point of view from Google's Sergey Brin. ([1] [2])

  • Google founder Sergey Brin proposed taking Google's entire marketing budget and allocating it "to inoculate Chechen refugees against cholera" ([1])

  • Brilliant XKCD comic on passwords and how websites should ask people to pick passwords ([1])

Wednesday, September 07, 2011

Blending machines and humans to get very high accuracy

A paper by six Googlers from the recent KDD 2011 conference, "Detecting Adversarial Advertisements in the Wild" (PDF) is a broadly useful example of how to succeed at tasks requiring very high accuracy using a combination of many different machine learning algorithms, high quality human experts, and lower quality human judges.

Let's start with an excerpt from the paper:
A small number of adversarial advertisers may seek to profit by attempting to promote low quality or untrustworthy content via online advertising systems .... [For example, some] attempt to sell counterfeit or otherwise fraudulent goods ... [or] direct users to landing pages where they might unwittingly download malware.

Unlike many data-mining tasks in which the cost of false positives (FP's) and false negatives (FN's) may be traded off, in this setting both false positives and false negatives carry extremely high misclassification cost ... [and] must be driven to zero, even for difficult edge cases.

[We present a] system currently deployed at Google for detecting and blocking adversial advertisements .... At a high level, our system may be viewed as an ensemble composed of many large-scale component models .... Our automated ... methods include a variety of ... classifiers ... [including] a single, coarse model ... [to] filter out .. the vast majority of easy, good ads ... [and] a set of finely-grained models [trained] to detect each of [the] more difficult classes.

Human experts ... help detect evolving adversarial advertisements ... [through] margin-based uncertainty sampling ... [often] requiring only a few dozen hand-labeled examples ... for rapid development of new models .... Expert users [also] search for positive examples guided by their intuition ... [using a custom] tool ... [and they have] surprised us ... [by] developing hand-crafted, rule-based models with extremely high precision.

Because [many] models do not adapt over time, we have developed automated monitoring of the effectiveness of each ... model; models that cease to be effective are removed .... We regularly evaluate the [quality] of our [human experts] ... both to access the performance of ... raters and measure our confidence in these assessments ... [We also use] an approach similar to crowd-sourcing ... [to] calibrate our understanding of real user perception and ensure that our system continues to protect the interest of actual users.
I love this approach, blending experts and the human intuition of experts to help guide, assist, and correct algorithms running over big data. These Googlers used an ensemble of classifiers, trained by experts that focused on labels of the edge cases, and ran them over features extracted from a massive data set of advertisements. They then built custom tools to make it easy for experts to search over the ads, follow their intuition, dig in deep, and fix the hardest cases the classifiers missed. Because the bad guys never quit, the Googlers not only constantly add new models and rules, but also constantly evaluate existing rules, models, and the human experts to make sure they are still useful. Excellent.

I think the techniques described here are applicable well beyond detecting naughty advertisers. For example, I suspect a similar technique could be applied to mobile advertising, a hard problem where limited screen space and attention makes relevance critical, but we usually have very little data on each user's interests, each user's intent, and each advertiser. Combining human experts with machines like these Googlers have done could be particularly useful in bootstrapping and overcoming sparse and noisy data, two problems that make it so difficult for startups to succeed on problems like mobile advertising.