论文信息 - From federated to aggregated search

From federated to aggregated search

Federated search refers to the brokered retrieval of content from a set of auxiliary retrieval systems instead of from a single, centralized retrieval system. Federated search tasks occur in, for example, digital libraries (where documents from several retrieval systems must be seamlessly merged) or peer-to-peer information retrieval (where documents distributed across a network of local indexes must be retrieved). In the context of web search, aggregated search refers to the integration of non-web content (e.g. images, videos, news articles, maps, tweets) into a web search result page. This is in contrast with classic web search where users are presented with a ranked list consisting exclusively of general web documents. As in other federated search situations, the non-web content is often retrieved from auxiliary retrieval systems (e.g. image or video databases, news indexes). Although aggregated search can be seen as an instance of federated search, several aspects make aggregated search a unique and compelling research topic. These include large sources of evidence (e.g. click logs) for deciding what non-web items to return, constrained interfaces (e.g. mobile screens), and a very heterogeneous set of available auxiliary resources (e.g. images, videos, maps, news articles). Each of these aspects introduces problems and opportunities not addressed in the federated search literature. Aggregated search is an important future research direction for information retrieval. All major search engines now provide aggregated search results. As the number of available auxiliary resources grows, deciding how to effectively surface content from each will become increasingly important. The goal of this tutorial is to provide an overview of federated search and aggregated search techniques for an intermediate information retrieval researcher. At the same time, the content will be valuable for practitioners in industry. We will take the audience through the most influential work in these areas and describe how they relate to real world aggregated search systems. We will also list some of the new challenges confronted in aggregated search and discuss directions for future work.

[1] Ziv Bar-Yossef,et al. Random sampling from a search engine's index , 2006, WWW '06.

[2] Clement T. Yu,et al. A highly scalable and effective method for metasearch , 2001, TOIS.

[3] Luo Si,et al. Learning from past queries for resource selection , 2009, CIKM.

[4] Luis Gravano,et al. Modeling and managing content changes in text databases , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5] Dik Lun Lee,et al. WISE: A World Wide Web Resource Database System , 1996, IEEE Trans. Knowl. Data Eng..

[6] Adele E. Howe,et al. Experiences with selecting search engines using metasearch , 1997, TOIS.

[7] Milad Shokouhi,et al. Segmentation of Search Engine Results for Effective Data-Fusion , 2007, ECIR.

[8] William P. Birmingham,et al. Architecture of a metasearch engine that supports user information needs , 1999, CIKM '99.

[9] Milad Shokouhi,et al. Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[10] James P. Callan,et al. The effectiveness of query expansion for distributed information retrieval , 2001, CIKM '01.

[11] Ellen M. Voorhees,et al. Learning collection fusion strategies , 1995, SIGIR '95.

[12] Anil S. Chakravarthy,et al. NetSerf: using semantic knowledge to find Internet information archives , 1995, SIGIR '95.

[13] Mounia Lalmas,et al. Dynamics of Genre and Domain Intents , 2010, AIRS.

[14] Paul Thomas,et al. Server characterisation and selection for personal metasearch , 2008, SIGF.

[15] Robert Villa,et al. Factors affecting click-through behavior in aggregated search interfaces , 2010, CIKM.

[16] Luis Gravano,et al. STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[17] Luis Gravano,et al. When one sample is not enough: improving text database selection using shrinkage , 2004, SIGMOD '04.

[18] Fernando Diaz,et al. Sources of evidence for vertical selection , 2009, SIGIR.

[19] Milad Shokouhi,et al. Capturing collection size for distributed non-cooperative retrieval , 2006, SIGIR.

[20] Joemon M. Jose,et al. Understanding domain "relevance" in web search , 2009 .

[21] Luis Gravano,et al. GlOSS: text-source discovery over the Internet , 1999, TODS.

[22] James P. Callan,et al. Effective retrieval with distributed collections , 1998, SIGIR '98.

[23] Norbert Fuhr,et al. Combining CORI and the Decision-Theoretic Approach for Advanced Resource Selection , 2004, ECIR.

[24] Mounia Lalmas,et al. Merging techniques for performing data fusion on the web , 2001, CIKM '01.

[25] Fernando Diaz,et al. Performance prediction using spatial autocorrelation , 2007, SIGIR.

[26] Dik Lun Lee,et al. Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[27] W. Bruce Croft,et al. Searching distributed collections with inference networks , 1995, SIGIR '95.

[28] James P. Callan,et al. Query-based sampling of text databases , 2001, TOIS.

[29] Luis Gravano,et al. Precision and recall of GlOSS estimators for database discovery , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[30] Jaime G. Carbonell,et al. Retrieval and feedback models for blog feed search , 2008, SIGIR '08.

[31] King-Lup Liu,et al. Building efficient and effective metasearch engines , 2002, CSUR.

[32] Milad Shokouhi,et al. Federated text retrieval from uncooperative overlapped collections , 2007, SIGIR.

[33] Nick Craswell,et al. Methods for Distributed Information Retrieval , 2000 .

[34] Milad Shokouhi,et al. SUSHI : Scoring Scaled Samples for Server Selection , 2009 .

[35] Xiao Li,et al. Learning query intent from regularized click graphs , 2008, SIGIR '08.

[36] Fernando Diaz,et al. Vertical selection in the presence of unlabeled verticals , 2010, SIGIR '10.

[37] Milad Shokouhi,et al. Using query logs to establish vocabularies in distributed information retrieval , 2007, Inf. Process. Manag..

[38] Garrison W. Cottrell,et al. Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[39] Luis Gravano,et al. Classification-aware hidden-web text database selection , 2008, TOIS.

[40] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.

[41] Mounia Lalmas,et al. A Task-Based Evaluation of an Aggregated Search Interface , 2009, SPIRE.