Wednesday, May 18, 2011

Eli Pariser is wrong

In recent interviews and in his new book, "The Filter Bubble", Eli Pariser claims that personalization limits serendipity and discovery.

For example, in one interview, Eli says, "Basically, instead of doing what great media does, which is push us out of our comfort zone at times and show us things that we wouldn't expect to like, wouldn't expect to want to see, [personalization is] showing us sort of this very narrowly constructed zone of what is most relevant to you." In another, he claims, personalization creates a "distorted view of the world. Hearing your own views and ideas reflected back is comfortable, but it can lead to really bad decisions--you need to see the whole picture to make good decisions."

Eli has a fundamental misunderstanding of what personalization is, leading him to the wrong conclusion. The goal of personalization and recommendations is discovery. Recommendations help people find things they would have difficulty finding on their own.

If you know about something already, you use search to find it. If you don't know something exists, you can't search for it. And that is where recommendations and personalization come in. Recommendations and personalization enhance serendipity by surfacing useful things you might not know about.

That is the goal of Amazon's product recommendations, to help you discover things you did not know about in Amazon's store. It is like a knowledgeable clerk who walks you through the store, highlighting things you didn't know about, helping you find new things you might enjoy. Recommendations enhance discovery and provide serendipity.

It was also the goal of Findory's news recommendations. Findory explicitly sought out news you would not know about, news from a variety of viewpoints. In fact, one of the most common customer service complaints at Findory was that there was too much diversity of views, that people wanted to eliminate viewpoints that they disagreed with, viewpoints that pushed them out of their comfort zone.

Eli's confusion about personalization comes from a misunderstanding of its purpose. He talks about personalization as narrowing and filtering. But that is not what personalization does. Personalization seeks to enhance discovery, to help you find novel and interesting things. It does not seek to just show you the same things you could have found on your own.

Eli's proposed solution is more control. But, as Eli himself says, control is part of the problem: "People have always sought [out] news that fits their own views." Personalization and recommendations work to expand this bubble that people try to put themselves it, to help them see news they would not look at on their own.

Recommendations and personalization exist to enhance discovery. They improve serendipity. If you just want people to find things they already know about, use search or let them filter things themselves. If you want people to discover new things, use recommendations and personalization.

Update: Eli Pariser says he will respond to my critique. I will link to it when he does.

Friday, May 13, 2011

Taking small steps toward personalized search

Some very useful lessons in this work in a recent WSDM 2011 conference, "Personalizing Web Search using Long Term Browsing History" (PDF).

First, they focused on a simple and low risk approach to personalization, reordering results below the first few. There are a lot of what are essentially ties in the ranking of results after the first 1-2 results; the ranker cannot tell the difference between the results and is ordering them arbitrarily. Targeting the results the ranker cannot differentiate is not only low risk, but more likely to yield easy improvements.

Second, they did a large scale online evaluation of their personalization approach using click data as judgement of quality. That's pretty rare but important, especially for personalized search where some random offline human judge is unlikely to know the original searcher's intent.

Third, their goal was not to be perfect, but just help more often than hurt. And, in fact, that is what they did, with the best performing algorithm "improving 2.7 times more queries than it harms".

I think those are good lessons for others working on personalized search or even personalization in general. You can take baby steps toward personalization. You can start with minor reordering of pages. You can make low risk changes lower down to the page or only when the results are otherwise tied for quality. As you get more aggressive, with each step, you can verify that each step does more good than harm.

One thing I don't like about the paper is that they only investigated using long-term history. There is a lot of evidence (e.g. [1] [2]) that very recent history, your last couple searches and clicks, can be important, since they may show frustration in an attempt to satisfy some task. But otherwise great lessons in this work out of Microsoft Research.

Monday, May 09, 2011

Quick links

Some of what has caught my attention recently:
  • Apple captured "a remarkable 50% value share of estimated Q1/11 handset industry operating profits among the top 8 OEMs with only 4.9% global handset unit market share." ([1]). The iPhone generates 50% of Apple's revenue and even more of their profits. To a large extent, the company is the iPhone company. ([2] [3]) But, Gartner predicts iPhone market share will peak in 2011. ([4])

  • Researchers find bugs in payment systems, order free stuff from Buy.com and JR.com. Disturbing that, when they contacted Buy.com to report the problem, Buy.com's accounting systems had the invoice as fully paid even though they never received the cash. ([1])

  • Eric Schmidt says, "The story of innovation has not changed. It has always been a small team of people who have a new idea, typically not understood by people around them and their executives." ([1])

  • Netflix randomly kills machines in its cluster all the time, just to make sure Netflix won't go down when something real kills their machines. Best part, they call this "The Chaos Monkey". ([1] [2])

  • Hello, Amazon, could I borrow 1,250 of your computers for 8 hours? ([1])

  • Felix Salmon says, "Eventually ... ad-serving algorithms will stop being dumb things based on keyword searches, and will start being able to construct a much more well-rounded idea of who we are and what kind of advertising we're likely to be interested in. At that point ... they probably won't feel nearly as creepy or intrusive as they do now. But for the time being, a lot of people are going to continue to get freaked out by these ads, and are going to think that the answer is greater 'online privacy'. When I'm not really convinced that's the problem at all." ([1])

  • Not sure which part of this story I'm more amazed by, that Google offered $10B for Twitter or that Twitter rejected $10B as not enough. ([1])

  • Apple may be crowdsourcing maps using GPS trail data. GPS trails can also be used for local recommendations, route planning, personalized recommendations, and highly targeted deals, coupons, and ads. ([1] [2] [3])

  • Management reorg at Google. Looks like it knocks back the influence of the PMs to me, but your interpretation may differ. ([1] [2])

  • If you use Google Chrome and go to google.com, you're using SPDY to talk to Google's web servers, not HTTP. Aggressive of Google and very cool. ([1] [2])

  • Shopping search engines (like product and travel) should look for good deals in their databases and then help people find good deals ([1])

  • When Apple's MobileMe execs started talking about what the poorly reviewed MobileMe was really supposed to do, Steve Jobs demanded, "So why the f*** doesn't it do that?", then dismissed the executives in charge and appointed new MobileMe leaders. ([1] [2])

Friday, May 06, 2011

The value of Google Maps directions logs

Ooo, this one is important. A clever and very fun paper, "Hyper-Local, Direction-Based Ranking of Places" (PDF), will be presented at VLDB 2011 later this year by a few Googlers.

The core idea is that, when people ask for directions from A to B, it shows that people are interested in B, especially if they happen to be at or near A.

Now, certain very large search engines have massive logs of people asking for directions from A to B, hundreds of millions of people and billions of A to B queries. And, it appears this data may be as or more useful than user reviews of businesses and maybe GPS trails for local search ranking, recommending nearby places, and perhaps local and personalized deals and advertising.

From the paper:
A query that asks for directions from a location A to location B is taken to suggest that a user is interested in traveling to B and thus is a vote that location B is interesting. Such user-generated direction queries are particularly interesting because they are numerous and contain precise locations.

Direction queries [can] be exploited for ranking of places ... At least 20% of web queries have local intent ... [and mobile] may be twice as high.

[Our] study shows that driving direction logs can serve as a strong signal, on par with reviews, for place ranking ... These findings are important because driving direction logs are orders of magnitude more frequent than user reviews, which are expensive to obtain. Further, the logs provide near real-time evidence of changing sentiment ... and are available for broader types of locations.
What is really cool is that, not only is this data easier and cheaper to obtain than customer reviews, but also there is so much more of it that the ranking is more timely (if, for example, ownership changes or a place closes) and coverage much more complete.

I find it a little surprising that Google hasn't already heavily been using this data. In fact, the paper suggests that Google is only beginning to start using it. At the end of the paper, the authors write that they hope to investigate what types of queries benefit the most from this data and then look at personalizing the ranking based on each person's specific search and location history.