Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Aggregating search results from a variety of heterogeneous sources, so-called verticals, such as news, image and video, into a single interface is a popular paradigm in web search. Current approaches that evaluate the effectiveness of aggregated search systems are based on rewarding systems that return highly relevant verticals for a given query, where this relevance is assessed under different assumptions. It is difficult to evaluate or compare those systems without fully understanding the relationship between those underlying assumptions. To address this, we present a formal analysis and a set of extensive user studies to investigate the effects of various assumptions made for assessing query vertical relevance. A total of more than 20,000 assessments on 44 search tasks across 11 verticals are collected through Amazon Mechanical Turk and subsequently analysed. Our results provide insights into various aspects of query vertical relevance and allow us to explain in more depth as well as questioning the evaluation results published in the literature.

[1]  Ke Zhou,et al.  Evaluating large-scale distributed vertical search , 2011, LSDS-IR '11.

[2]  Joemon M. Jose,et al.  Evaluating reward and risk for vertical selection , 2012, CIKM '12.

[3]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[4]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[5]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  Joemon M. Jose,et al.  Evaluating aggregated search pages , 2012, SIGIR '12.

[8]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[9]  Fernando Diaz,et al.  Learning to aggregate vertical results into web search results , 2011, CIKM '11.

[10]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.

[11]  Fernando Diaz,et al.  Vertical selection in the presence of unlabeled verticals , 2010, SIGIR '10.

[12]  Robert Villa,et al.  Factors affecting click-through behavior in aggregated search interfaces , 2010, CIKM.

[13]  Fernando Diaz,et al.  A Methodology for Evaluating Aggregated Search Results , 2011, ECIR.

[14]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[15]  Tapas Kanungo,et al.  On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals , 2011, WSDM '11.

[16]  Markus Schulze,et al.  A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method , 2011, Soc. Choice Welf..

[17]  Martin Halvey,et al.  Assessing and Predicting Vertical Intent for Web Queries , 2012, ECIR.

[18]  David Hawking,et al.  Server selection methods in hybrid portal search , 2005, SIGIR '05.