Thursday, July 19, 2007

The many paths of personalization

Here I go, quoting Gord Hotchkiss again. From an article today:
We're trying to paint personalization into a corner based on Google's current implementation of it. And that's absolutely the wrong thing to do.

Personalization is not a currently implemented algorithm ... [It] is an area of development.

Personalization, in its simplest form, is simply knowing more about you as an individual and using that knowledge to better connect you to content and functionality on the Web.

There are many paths you can take to that same end goal .... To win, Google doesn't have to do it perfectly. It just has to do it better than everyone else.
Right now, relevance rank must rank according to the average need. But, different people have different interpretations of what is or is not relevant. It is getting harder and harder to find improvements while still serving the average need.

At some point, the only way to further improve the quality of search results will be to show different people different search results based on what they think is relevant.

At that point, we have personalized search. Showing different results to different people based on what you know of their interests is personalized search.

There are many approaches to do personalized search. To the extent that Google's current algorithm is based on the Kaltix work, it is a coarse-grained approach, building a long-term profile which then subtly influences your future results. I tend to prefer a fine-grained approach that focuses on short-term history to help searchers with what they are doing right now.

Yet those are only two of the possibilities of how knowing more about a searcher's interests could help improve relevance. Google's implementation is not the only path. There are many ways to show different people different results based on their interests, some of which could prove more helpful than Google's to searchers.

See also some of my earlier posts ([1] [2] [3]) where I criticized Google's approach to personalized search and discussed an alternative.

See also my March 2005 post, "Personalization is hard. So what?", where I said that personalization "doesn't have to be right all the time. It just needs to be helpful."

Monday, July 16, 2007

Interview of Peter Norvig

Kate Greene at MIT Technology Review just published an interview with AI guru and Googler Peter Norvig.

HBase: A Google Bigtable clone

HBase appears to be a very early stage open source project to clone Google Bigtable.

From the project page:
Google's Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.

Just as Bigtable leverages the distributed data storage provided by the Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.

Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).
Note that two of the four early contributors -- Jim Kellerman and Michael Stack -- are from Powerset. Some members of the Powerset team also have been trying to run Hadoop on Amazon EC2.

See also my earlier posts, "Google Bigtable paper" and "Google's Bigtable".

See also my earlier posts, "Yahoo building a Google FS clone?" and "GFS, MapReduce, and Hadoop".

Update: Looking at the HBase source tree, there appear to be other contributors who are not listed on the project page. Not surprising to see that Mike Cafarella and Doug Cutting were deeply involved.

Sunday, July 15, 2007

Caching and search engine performance

Ricardo Baeza-Yates and others crew from Yahoo Research have a paper at the upcoming SIGIR 2007 conference, "The Impact of Caching on Search Engines" (PDF).

Some excerpts:
[Caching] is either off-line (static) or online (dynamic). A static cache is based on historical information and is periodically updated. A dynamic cache replaces entries according to the sequence of requests.

There are [also] two possible ways to use a cache memory: caching answers ... [and] caching terms ... Returning an answer to a query ... in the cache is more efficient than computing the answer using cached posting lists ... [but] previously unseen queries occur more often than previously unseen terms, implying a higher miss rate for cached answers.

Our results show ... caching terms is more effective with respect to miss rate, achieving values as low as 12%. We also propose a new algorithm for static caching of posting lists that outperforms previous static caching algorithms as well as dynamic algorithms such as LRU and LFU, obtaining hit rate values that are over 10% higher compared these strategies.
For me, the most interesting result in the paper is the apparent viability of static caching. Offhand, I would not have expected static caching -- merely pre-caching items based on old query log data -- to be as effective as they claim it would be.

Frankly, if I were given this particular caching problem, my inclination would be to immediately leap to an implementation using dynamic caching. This paper argues that would be foolish. If you are like me, perhaps there is a general lesson in this, that we should not be so quick to dismiss static caching.

I think there also is another question to ponder that is not directly addressed in this paper. What should the size of a caching layer be?

In the paper, very large cache sizes are considered, as much as 100% of the index size. However, the paper seems to treat decision about the resources (the number of machines) allocated to the caching layer as independent of the number of machines allocated to the index layer.

What is more realistic is a situation where I have N machines and X of them can be allocated to the index layer and (N-X) of them can be allocated to the caching layer.

Why does this matter? The more machines I allocate to the index layer, the faster the index layer becomes, because more index shards fit in memory. The number of machines allocated to caching cannot be considered in isolation of the number of machines allocated to the index shards. Shifting machines from one to the other changes the performance characteristics of both.

Expanding this out from search engines, the same question applies to database layers. Often, the reaction to a slow database is to add layers of caching. In some cases, these caching layers grow to where the number of machines devoted to caching exceeds the number of machines in the database layer.

At that point, the question has to be asked, would those machines be better in the database layer? How much faster could the database be if each of the database shards could be kept entirely in memory? Would most or even all of that increasingly convoluted caching layer suddenly become unnecessary?

Going back to the paper, I think it is a worthwhile read, not only for the research work reported, but also for the thoughts it provokes. Caching may seem simple, but getting it right is not. There are more questions to ponder here than at first it might appear.

Update: Chad Walters (former architect on Yahoo web search, now at Powerset) posts some interesting thoughts on this paper and some of the practical issues that he has seen come up when considering static versus dynamic caching.

Thursday, July 12, 2007

Netflix Prize enabling recommender research

Yi Zhang and Jonathan Koren from UC Santa Cruz have a paper, "Efficient Bayesian Hierarchical User Modeling for Recommender Systems" (PDF), at the upcoming SIGIR 2007 conference.

The paper focuses on "improving the efficiency and effectiveness of Bayesian hierarchical linear models, which have a strong theoretical basis and good empirical performance on recommendation tasks." Specifically, they significantly improve the speed of running expectation-maximization by ignoring the "many dimensions that are not 'related' to a particular user", focusing on "related user-feature pairs".

That is interesting, especially since Yi Zhang is part of the wxyzconsulting.com team that is currently in 10th place on the Netflix Prize Leaderboard.

Even so, what primarily excited me about this paper is that it is an excellent example of research on scaling recommender systems that simply would not be possible without the Netflix data set. The Netflix data set is two orders of magnitude bigger than the MovieLens and EachMovie data sets that were previously available.

Time and time again, I have been frustrated to read recommender system papers that are twiddling around on toy problems. They often achieve minor improvements in quality, but at the cost of efficiency, usually presenting techniques that will not scale to catalogs or numbers of customers of even modest size.

One might argue that these improvements are of theoretical interest, but there is some doubt about even that. As Googler Peter Norvig said:
Rather than argue about whether this algorithm is better than that algorithm, all you have to do is get ten times more training data. And now all of a sudden, the worst algorithm ... is performing better than the best algorithm on less training data.

Worry about the data first before you worry about the algorithm.
The problem, as Michele Banko and Eric Brill noted (PDF) in 2001, are that algorithmic twiddles that yield improvements at small scale may no longer yield improvements at large scale and training with more data often results in bigger improvements than algorithm changes when training with small data.

Now that Netflix has made big data available to researchers, we will see a lot more work on large scale, practical recommender systems like this Zhang and Koren paper. And we all will benefit.

See also my Jan 2007 post, "The Netflix Prize and big data".

[Thanks, Siddharth, for pointing out the Zhang and Koren paper]

Update: Two months later, the KDD Cup 2007 is done, and it focused on the Netflix Prize. Don't miss the great collection of papers. Very impressive how much work on recommender systems at large scale is coming out of this contest.

Update: A ECML 2007 paper, "Pricipal Component Analysis for Large Scale Problems with Lots of Missing Values" (PDF), has a nice discussion of using PCA for recommendations at large scale, particularly around the issue of overfitting. [Found via Simon Funk]

Wednesday, July 11, 2007

Facebook and the perils of free APIs

The hype over Facebook widgets has reached dot-com levels of frenzy. As Matt Marshall at VentureBeat reports:
Silicon Valley venture capital firm Bay Partners said it wants to write checks of between $25,000 and $250,000 to developers writing applications for Facebook's platform.
While the success of iLike using the Facebook API is impressive, they have not bet their business on the continued use of the Facebook API. Startups that do are completely dependent on the continued goodwill of Facebook.

A look at the terms of service for the Facebook API should make it very clear just how fragile this dependency is:
We reserve the right to charge a fee for using the Facebook Platform and/or any individual features thereof at any time in our sole discretion ... We reserve the right to specify the manner in which the fee will be calculated, the terms on which you will be invoiced and charged and the terms of payment.

Facebook may at any time in its sole discretion, without liability, with or without cause and with or without notice ... terminate or suspend your access to the Facebook Platform.
Perhaps a startup could attempt to negotiate different terms for a very popular widget, but Facebook holds all the cards in that negotiation and likely will extract most or all of any value generated by the widget.

Om Malik seems to agree:
Seemingly clever guardians of wealth are getting caught up in the euphoria and loosening their purse strings.

App Factory, is interesting, cringe-worthy reading filled with cliches like "application entrepreneurs' and "affect adoption, virality, and usage."

Putting my newly acquired Yiddish skills to use, I say, Oy-vey!

Facebook, despite the cleverness of its recent platform strategy, is still a start-up, and a funding vehicle focused entirely on its ecosystem seems a bit rash. There is still a fog around these Facebook apps-as-businesses.
See also my earlier post, "The truth about free APIs".

Google Scalability Conference talks available

Nearly all of the talks from the Google Seattle Conference on Scalability are now available on Google Video:Unfortunately, the Amazon.com talk, "Challenges in Building an Infinite Scalable Datastore" by Swami Sivasubramanian and Werner Vogels, does not yet appear to be available.

However, there are two additional talks are available that appear to be part of the conference but were not on the original schedule:From the talks I have watched so far, it looks like it was a fantastic conference. I am disappointed I could not make it.

See also my earlier post, "More on Google Scalability Conference", that has links off to notes some attendees took during some of the talks.

Update: By the way, Marissa's talk, which is light and talks about the future of Google, is most likely to be of broad interest. If you only have time to watch one of these, watch that one. Her presentation is good, but the Q&A session starting at 34:47 is the thing not to miss. Marissa also talks up personalized search at 33:08 of her presentation, saying that it is key for building "the search engine of the future."

Update: In case you don't watch the YouTube talk, let me summarize it briefly. They tried to do MySQL database replication, faced many problems for a long time, and finally switched to what they should have done in the first place, large scale database partitioning. Unfortunately, this is a common mistake at many places struggling with scaling, overdoing database replication and caching and underdoing partitioning.

MSN Search gains?

Danny Sullivan at Search Engine Land analyzes a Compete.com report that shows MSN Search (aka Windows Live Search) gaining market share, "jumping from 8.4 percent in May 2007 to 13.2 percent in June."

Danny and a follow-up post from Steve Willis at Compete.com both credit the gain to a marketing promotion Microsoft is running called Live Search Club that apparently involves executing many searches as part of playing games and winning prizes. Both note that MSN Search share was only 9.1 percent -- a modest gain if not flat -- if you exclude all the Live Search Club searches.

Even if this is all from Live Search Club, I am surprised. I would not have thought a promotion like that could result in this much of a gain. I also would doubt such a gain can be sustained much beyond the duration of the promotion, but it is a trend worth watching.

See also my Nov 2006 post, "Google dominates, MSN Search sinks".

Update: Looks like at least part of those gains from that promotion were temporary. Two months later, Todd Bishop reports Live Search market share has dropped in July and August.

Tuesday, July 10, 2007

Attacking recommender systems

A good (but very long) paper by Mobasher et al., "Towards trustworthy recommender systems: An analysis of attack models and algorithm robustness" (PDF), explores a variety of ways of spamming or otherwise manipulating recommendation systems.

Some excerpts:
An attack against a collaborative filtering recommender system consists of a set of attack profiles, each contained biased rating data associated with a fictitious user identity, and including a target item, the item that the attacker wishes the system to recommend more highly (a push attack), or wishes to prevent the system from recommending (a nuke attack).

Previous work had suggested that item-based collaborative filtering might provide significant robustness compared to the user-based algorithm, but, as this paper shows, the item-based algorithm also is still vulnerable in the face of some of the attacks we introduced.
The paper lists several types of attacks, suggests several ways to detect attacks, tests several attacks using the GroupLens movie data set, and concludes that "item-based proved far more robust overall" but that "a knowledge-based / collaborative hybrid recommendation algorithm .... [that] extends item-based similarity by combining it with content based similarity .... [seems] likely to provide defensive advantages for recommender systems."

It is worth noting that spam is a much worse problem with winner-take-all systems that show the most popular or most highly rated articles (like Digg). In those systems, spamming gets you seen by everyone.

In recommender systems, spamming only impacts the fraction of the users who are in the immediate neighborhood and see the spammy recommendations. The payoff is much reduced and so is the incentive to spam.

For more on that, please see my Jul 2006 post, "Combating web spam with personalization" and my Jan 2007 post, "SEO and personalized search".

The same authors published a similar but much shorter article, "Attacks and Remedies in Collaborative Recommendation", in the May/June IEEE Intelligent Systems, but there is no full text copy of that article easily available online.

Thanks, Gary Price, for pinging me about the IEEE Intelligent Systems May/June issue and pointing out that it has several articles on recommender systems.

Saturday, July 07, 2007

People often repeat web searches

Teevan et al. have a paper, "Information Re-Retrieval: Repeat Queries in Yahoo's Logs" (PDF), at the upcoming SIGIR 2007 conference.

The paper points out that people often repeat queries to re-find information they found in the past. It also shows that these queries can be predicted and, by maintaining a search history for each user, facilitated if we can surface the history at appropriate times.

Some excerpts:
People often repeat Web searches, both to find new information on topics they have previously explored and to re-find information they have seen in the past ... Our study demonstrates that as many as 40% of all queries are re-finding queries.

Re-finding appears to be an important behavior for search engines to explicitly support, and we explore how this can be done. We demonstrate that changes to search engine results can hinder re-finding, and provide a way to automatically detect repeat searches and predict repeat clicks.
One proposal in the paper is to move items up in search results that a searcher has clicked on in the past:
The data is suggestive of a positive improvement in time-to-click for positive changes in rank as well as some benefit to no change (likely due to learning). When previously clicked results move down in rank, time-to-click increases.

A hypothesis consistent with previous work on eye-tracking in search is that users pay more attention to early-ranked items. Thus, if a previously clicked on result moves up, it is more easily re-found via a visual scan.
The authors also make other suggestions on how to surface search history and how to deal with the conflict between finding new information and re-finding old:
Traditionally, search engines have focused on returning search
results without consideration of the user’s past query history, but the results of the log study suggest it might be a good idea for them to do otherwise.

Although finding and re-finding tasks may require different strategies, tools will need to seamlessly support both activities ... Because people repeat queries so frequently, search engines should assist their users by providing a means of keeping a record of individual users' search histories.

[There] may benefit from having a different amount of screen real estate devoted to displaying ... search history ... Search histories could be customized based on many factors including the time of day. Users with a large number of navigational queries may also benefit from the direct linking to the Webpage (possibly labeled with the frequent query term).

While a user may simultaneously have a finding and re-finding intent when searching, satisfying both needs may be in conflict. Finding new information means being returned the best new information, while re-finding means being returned the previously viewed information.

We found that when previously viewed search results changed to include new information, the searcher’s ability to re-find was hampered. It is important to consider how the two search modalities can be reconciled so a user can interact with new, and previously seen, information.
There are also some interesting breakdowns of the types of search queries they saw on page 3-4 of the paper in tables 1-3.

On a related note, one thing I like about this work is that it shows the value of starting to walk down the path toward search personalization.

A first and necessary step toward personalization is to start maintaining search and viewing history for each user. As Teevan et al. point out, search and viewing history has a lot of value when surfaced to users.

When companies ask me about personalization, I often recommend they start with baby steps -- maintaining and surfacing history, showing related content -- rather than trying to implement full personalization immediately. These are relatively simple to implement and need to be done well anyway before anyone can do full personalization.

Much value can be gained from the first steps on the road to personalization. Pick the low hanging fruit before trying to tackle the harder problems.

See also my previous posts ([1] [2] [3]) on some of Jaime Teevan's work on personalized search.

Stonebraker on fast databases

Very interesting interview with database guru Michael Stonebraker in the May ACM Queue. Some excerpts:
Data warehouses are getting positively gigantic. It's very hard to run ad hoc queries against 20 terabytes of data and get an answer back anytime soon. The data warehouse market is one where we can get between one- and two-orders-of-magnitude performance improvements.

Stream processing ... a feed comes out of the wall and you run it through a workflow to normalize the symbols, clean up the data, discard the outliers, and then compute some sort of secret sauce .... This is a fire hose of data ... A specialized architecture can just clobber the relational elephants in this market.

None of the big text vendors, such as Google and Yahoo, use databases; they never have. They didn't start there, because the relational databases were too slow from the get-go. Those guys have all written their own engines.

In scientific and intelligence databases ... if you have array data and use special-purpose technology that knows about arrays, you can clobber a system in which tables are used to simulate arrays.

[For] Wall Street .... it's basically a latency arms race. If your infrastructure was built with one-second latency, it's just impossible to continue, because if the people arbitraging against you have less latency than you do, you lose. A lot of the legacy infrastructures weren't built for sub-millisecond latency.

Let's say you have an architecture where you process the data from the wire and then use your favorite messaging middleware to send it to the next machine, where you clean the data. People just line up software architectures with a bunch of steps, often on separate machines, and often on separate processes. And they just get clobbered by latency.

Is the relational model going to make it? In semi-structured data, it's already obvious that it's not ... Data warehouses ... are better modeled as entity relationships rather than in a relational model.

Both the programming language interface and the data model can be thrown up in the air. We aren't in 1970. It's 37 years later, and we should rethink what we're trying to accomplish and what are the right paradigms to do it.
[Interview found via Werner Vogels]

Thursday, July 05, 2007

What to advertise when there is no commercial intent?

Bill Slawski at Search Engine Lands, after reading a WWW 2007 paper that classifies search engine queries, notes that "most queries are noncommercial."

These queries pose a problem for advertising. What do you advertise on web pages or queries that have no commercial intent?

There appear to be two approaches: (1) Target ads anyway, doing the best job we can. (2) Look elsewhere for commercial intent.

Most web advertising today uses the first method, target the ads anyway, sometimes with hilarious consequences. For example, Google AdSense, on news stories about immigrant trafficking, has been known to show ads for boat rentals and other services that might assist in immigrant trafficking. A news story I just saw about a plane crash was showing ads for firefighting equipment and surplus army MREs.

Given the lack of commercial intent on these queries, it is hard to do better. What ads should we show next to an article about a plane crash or immigrant trafficking? Is anything really appropriate?

Perhaps not. That is where the second approach comes in. What if we look elsewhere for commercial intent instead?

Personalized advertising is not just about showing different advertising to different people. Personalized advertising often requires maintaining a history of what someone has done recently.

In this recent history, there will be items that have strong commercial intent, in many cases, much stronger commercial intent than the current page. We are reaching back into the history to target the ads might be more appropriate than targeting to the current page.

For example, let's say I just looked at a review of a car, then looked at a news article about a plane crash. One of these actions shows strong commercial intent, the other does not. We can target ads easily to the car review, but not to the plane crash.

At the moment, newspapers, weblogs, and other web sites have great content, but difficulty monetizing that content. This is because much of the content reveals little or no information about the commercial intent of the people viewing the content. Ads targeted to those pages are often ineffective and, because of the lack of commercial intent, probably will continue to be no matter how much content targeting improves.

Personalized advertising reaches back into the past, looking for something that has commercial intent. Rather than attack the problem of divining commercial intent where their is none, personalized advertising tackles the much easier problem of recalling past actions that do show strong commercial intent.

Surprisingly, at least to me, there appear to be few examples of personalized advertising systems that attempt fine-grained targeting to recent history. Findory attempted to build one that targets to content in your Findory history. Amazon built Omakase, which targets to items in your Amazon purchase and viewing history.

Most other attempts at personalized advertising appear to target to high-level subject interests or demographics, not to recent history. They are missing the opportunity to find commercial intent in what I did recently rather than what I am doing now.

Google cuts Founders' Awards

Interesting tidbit in a July 2007 Fortune article, "Close to the Vest", about retention at Google:
Early on Page and Brin gave "Founders' Awards" in cash to people who made significant contributions. The handful of employees who pulled off the unusual Dutch auction public offering in August 2004 shared $10 million.

The idea was to replicate the windfall rewards of a startup, but it backfired because those who didn't get them felt overlooked. "It ended up pissing way more people off," says one veteran.

Google rarely gives Founders' Awards now, preferring to dole out smaller executive awards, often augmented by in-person visits by Page and Brin.
It is not surprising that the Founders' Awards would piss people off. No matter how it was done, I am sure the decision of who received the award seemed arbitrary to many of those who did not get it.

See also my May 2007 post, "Management and total nonsense", where I reviewed a management book by Jeffrey Pfeffer and Robert Sutton and quoted them as saying:
People are more likely to ... see themselves more positively than others see them [and] believe they are above average or not recognize their lack of competence.

People who receive a smaller reward than they expect routinely resent the organization.
See also my April 2006 post, "Early Amazon: Just do it", where I said:
While merit pay sounds like a great idea in theory, it seems it never works in any large organization. It appears to be impossible to do fairly -- politics and favoritism always enter the mix -- and, even if it could be done fairly, it never makes people happy.

Instead, compensation should be high but basically flat. Merit rewards should focus on non-monetary compensation.

Maybe even an stinky old shoe.
[Fortune article found via Marc Andreessen]

Update: See also Dare Obasanjo's thoughts on Marc Andreessen's post, especially Dare's conclusions on "significantly differentiated financial rewards for your 'best employees'."

Dare also points to an April 2000 post from Joel Spolsky called "Incentive Pay Considered Harmful". Note that Joel concludes that there should be no merit-based rewards at all, not even stinky old shoes.

Wednesday, July 04, 2007

Learning cluster labels from search logs

I much enjoyed a paper, "Learn from Web Search Logs to Organize Search Results" (PDF), that will be presented by Xuanhui Wang and ChengXiang Zhai at the upcoming SIGIR 2007 conference.

The idea is simple but clever. Search logs contain queries that people used to find the information they wanted. These queries might be better labels for groups of web documents than labels extracted from the content.

From the paper:
When the search results are diverse (e.g., due to ambiguity or multiple aspects of a topic) [or poor] ... it [may] be better to group the search results into clusters so that a user can easily navigate into a particular interesting group.

The general idea in virtually all the existing work is to perform clustering on a set of top-ranked search results to partition the results into natural clusters, which often correspond to different subtopics of the general query topic.

Clusters discovered in this way do not necessarily correspond to the interesting aspects of a topic from the user's perspective .... The cluster labels generated [also often] are not informative enough to allow a user to identify the right cluster.

For example, the ambiguous query "jaguar" may mean an animal or a car. A cluster may be labeled as "panthera onca." Although this is an accurate label for a cluster with the "animal" sense of "jaguar", if a user is not familiar with the phrase, the label would not be helpful.

Our idea of using search engine logs is to treat these logs as past history, learn users' interests using this history data automatically, and represent their interests by representative queries.

For example, in the search logs, a lot of queries are related to "car" and this reflects that a large number of users are interested in information about "car." Different users are probably interested in different aspects of "car." Some are looking for renting a car, thus may submit a query like "car rental"; some are more interested in buying a used car, and may submit a query like "used car"; and others may care more about buying a car accessory, so they may use a query like "car audio."

By mining all the queries which are related to the concept of "car", we can learn the aspects that are likely interesting from a user's perspective.

[Our] experiments show that our log-based method can consistently outperform cluster-based method and improve over the ranking baseline, especially when the queries are difficult or the search results are diverse. Furthermore, our log-based method can generate more meaningful aspect labels than the cluster labels generated based on search results when we cluster search results.
There are some compelling examples of the different labels you get with the two methods (search log-based vs. content-based) on page 7 in tables 5 and 6.

The paper also mentions that this kind of technique only appears to help when the query is ambiguous or difficult. The same is true of personalization, but I think that is to be expected. If we have already have good information about user intent -- the query is not ambiguous and the top results are already just dandy -- more information (from personalization, clustering, NLP, or anything else) will not make any difference.

In all, good stuff. I love techniques that look at what people are finding and doing to help other people. Buried in the search logs, there is all the hard work searchers do to discover what they need in a sea of information. By surfacing it, we can help searchers help each other.

Tuesday, July 03, 2007

Slides from Google Searchology

Philipp Lenssen noted that the video and slides from Google's May 16 Searchology event are available.

The slides are a little painful to skim in the UI provided. I threw them into a big, long list, which might be useful if you, like me, want to just zip over them quickly.

There is some good coverage of the event at Search Engine Land and TechCrunch.

Characterizing the value of personalized search

Jaime Teevan, Susan Dumais, and Eric Horvitz from Microsoft Research have a poster at the upcoming SIGIR 2007 conference titled "Characterizing the Value of Personalizing Search" (PDF).

Some excerpts:
Our analysis suggests that while search engines do a good job of ranking results to maximize global happiness, they do not do a very good job for specific individuals.

We observed a great deal of variation in participants' rating of results. One reason for the variability in ratings is that participants associated different intents with the same query.

Even when the [users' provided] detailed descriptions [of their intent and those descriptions] were very similar, ratings varied ... It was clearly hard for participants to accurately describe their intent.

The ratings for some queries showed more agreement than others, suggesting that some queries may be intrinsically less ambiguous.

[Current rankings tend] to be closer to the ranking that is best for the group than ... best for the individual ... A considerable gap in ... quality is created by requiring the lists to be the same for everyone.
See also my March 2006 post, "Beyond the commons: Personalized web search", that discusses another paper on a similar topic by the same authors.