Evaluating large-scale distributed vertical search

Aggregating search results from a variety of distributed heterogeneous sources, i.e. so-called verticals, such as news, image, video and blog, into a single interface has become a popular paradigm in large-scale web search. As various distributed vertical search techniques (also as known as aggregated search) have been proposed, it is crucial that we need to be able to properly evaluate those systems on a large-scale standard test set. A test collection for aggregated search requires a number of verticals, each populated by items (e.g. documents, images, etc) of that vertical type, a set of topics expressing information needs relating to one or more verticals, and relevance assessments, indicating the relevance of the items and their associated verticals to each of the topics. Building a large-scale test collection for aggregate search is costly in terms of time and resources. In this paper, we propose a methodology to build such a test collection reusing existing test collections, which allows the investigation of aggregated search approaches. We report on experiments, based on twelve simulated aggregated search systems, that show the impact of misclassification of items into verticals to the evaluation of systems.

[1]  Marina Santini,et al.  Automatic identification of genre in Web pages , 2011 .

[2]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[3]  Mounia Lalmas,et al.  Workshop on aggregated search , 2008, SIGF.

[4]  Tapas Kanungo,et al.  On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals , 2011, WSDM '11.

[5]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[6]  Mounia Lalmas,et al.  Dynamics of Genre and Domain Intents , 2010, AIRS.

[7]  Fernando Diaz,et al.  A Methodology for Evaluating Aggregated Search Results , 2011, ECIR.

[8]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[9]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[10]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[11]  Efstathios Stamatatos,et al.  Learning to recognize webpage genres , 2009, Inf. Process. Manag..

[12]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[13]  Ben Carterette,et al.  Measuring the reusability of test collections , 2010, WSDM '10.

[14]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[15]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[16]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.