Composite retrieval of heterogeneous web search

Traditional search systems generally present a ranked list of documents as answers to user queries. In aggregated search systems, results from different and increasingly diverse verticals (image, video, news, etc.) are returned to users. For instance, many such search engines return to users both images and web documents as answers to the query "flower". Aggregated search has become a very popular paradigm. In this paper, we go one step further and study a different search paradigm: composite retrieval. Rather than returning and merging results from different verticals, as is the case with aggregated search, we propose to return to users a set of "bundles", where a bundle is composed of "cohesive" results from several verticals. For example, for the query "London Olympic", one bundle per sport could be returned, each containing results extracted from news, videos, images, or Wikipedia. Composite retrieval can promote exploratory search in a way that helps users understand the diversity of results available for a specific query and decide what to explore in more detail. In this paper, we propose and evaluate a variety of approaches to construct bundles that are relevant, cohesive and diverse. Compared with three baselines (traditional "general web only" ranking, federated search ranking and aggregated search), our evaluation results demonstrate significant performance improvement for a highly heterogeneous web collection.

[1]  Laks V. S. Lakshmanan,et al.  Breaking out of the box of recommendations: from items to packages , 2010, RecSys '10.

[2]  Joemon M. Jose,et al.  Evaluating aggregated search pages , 2012, SIGIR '12.

[3]  Jon Whittle,et al.  CARD: a decision-guidance framework and application for recommending composite alternatives , 2008, RecSys '08.

[4]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[5]  Sihem Amer-Yahia,et al.  Composite Retrieval of Diverse and Complementary Bundles , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[7]  David C. Blair,et al.  Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths; 1979: 208 pp. Price: $32.50 , 1979, J. Am. Soc. Inf. Sci..

[8]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[9]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[10]  Djoerd Hiemstra,et al.  Federated search in the wild: the combined power of over a hundred search engines , 2012, CIKM '12.

[11]  Robert Villa,et al.  Factors affecting click-through behavior in aggregated search interfaces , 2010, CIKM.

[12]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[13]  Cong Yu,et al.  Constructing and exploring composite items , 2010, SIGMOD Conference.

[14]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[15]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[16]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[17]  Sihem Amer-Yahia,et al.  Complexity and algorithms for composite retrieval , 2013, WWW '13 Companion.

[18]  Cong Yu,et al.  Automatic construction of travel itineraries using social breadcrumbs , 2010, HT '10.

[19]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[20]  Carmel Domshlak,et al.  A rank-aggregation approach to searching for optimal query-specific clusters , 2008, SIGIR '08.

[21]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[22]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[23]  Joemon M. Jose,et al.  Which vertical search engines are relevant? , 2013, WWW '13.

[24]  Craig MacDonald,et al.  Aggregated Search Result Diversification , 2011, ICTIR.