Learning to aggregate vertical results into web search results

Aggregated search is the task of integrating results from potentially multiple specialized search services, or verticals, into the Web search results. The task requires predicting not only which verticals to present (the focus of most prior research), but also predicting where in the Web results to present them (i.e., above or below the Web results, or somewhere in between). Learning models to aggregate results from multiple verticals is associated with two major challenges. First, because verticals retrieve different types of results and address different search tasks, results from different verticals are associated with different types of predictive evidence (or features). Second, even when a feature is common across verticals, its predictiveness may be vertical-specific. Therefore, approaches to aggregating vertical results require handling an inconsistent feature representation across verticals, and, potentially, a vertical-specific relationship between features and relevance. We present 3 general approaches that address these challenges in different ways and compare their results across a set of 13 verticals and 1070 queries. We show that the best approaches are those that allow the learning algorithm to learn a vertical-specific relationship between features and relevance.

[1]  Tapas Kanungo,et al.  On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals , 2011, WSDM '11.

[2]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[6]  Markus Schulze,et al.  A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method , 2011, Soc. Choice Welf..

[7]  Qiang Wu,et al.  Click-through prediction for news queries , 2009, SIGIR.

[8]  Fernando Diaz,et al.  Adaptation of offline vertical selection predictions in the presence of user feedback , 2009, SIGIR.

[9]  Sergei Vassilvitskii,et al.  Generalized distances between rankings , 2010, WWW '10.

[10]  Fernando Diaz,et al.  Vertical selection in the presence of unlabeled verticals , 2010, SIGIR '10.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[13]  Fernando Diaz,et al.  A Methodology for Evaluating Aggregated Search Results , 2011, ECIR.

[14]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[15]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[16]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[17]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.