Effective query expansion for federated search

While query expansion techniques have been shown to improve retrieval performance in a centralized setting, they have not been well studied in a federated setting. In this paper, we consider how query expansion may be adapted to federated environments and propose several new methods: where focused expansions are used in a selective fashion to produce specific queries for each source (or a set of sources). On a number of different testbeds, we show that focused query expansion can significantly outperform the previously proposed global expansion method, and---contrary to earlier work---show that query expansion can improve performance over standard federated retrieval. These findings motivate further research examining the different methods for query expansion, and other forms of system and user interaction, in order to continue improving the performance of interactive federated search systems.

[1]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[2]  Sriram Raghavan,et al.  Search Middleware and the Simple Digital Library Interoperability Protocol , 2000, D Lib Mag..

[3]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[4]  Luis Gravano,et al.  STARTS: Stanford Protocol Proposal for Internet Retrieval and Search , 1997 .

[5]  Luo Si,et al.  Unified utility maximization framework for resource selection , 2004, CIKM '04.

[6]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[7]  Luo Si,et al.  The FedLemur project: Federated search in the real world , 2006 .

[8]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[9]  David Hawking,et al.  Server selection methods in hybrid portal search , 2005, SIGIR '05.

[10]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[11]  Milad Shokouhi,et al.  Capturing collection size for distributed non-cooperative retrieval , 2006, SIGIR.

[12]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[13]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[14]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[15]  Ryen W. White,et al.  Evaluating implicit feedback models using searcher simulations , 2005, TOIS.

[16]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[17]  James P. Callan,et al.  The effectiveness of query expansion for distributed information retrieval , 2001, CIKM '01.

[18]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[19]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[20]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[21]  Luis Gravano,et al.  STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[22]  Luis Gravano,et al.  QProber: A system for automatic classification of hidden-Web databases , 2003, TOIS.

[23]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[24]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Milad Shokouhi,et al.  Using query logs to establish vocabularies in distributed information retrieval , 2007, Inf. Process. Manag..

[27]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .