Fast and Eeective Query Reenement

Query Re nement is an essential information retrieval tool that interactively recommends new terms related to a particular query. This paper introduces concept recall, an experimental measure of an algorithm's ability to suggest terms humans have judged to be semantically related to an information need. This study uses precision improvement experiments to measure the ability of an algorithm to produce single term query modi cations that predict a user's information need as partially encoded by the query. An oracle algorithm produces ideal query modi cations, providing a meaningful context for interpreting precision improvement results. This study also introduces RMAP, a fast and practical query re nement algorithm that re nes multiple term queries by dynamically combining precomputed suggestions for single term queries. RMAP achieves accuracy comparable to a much slower algorithm, although both RMAP and the slower algorithm lag behind the best possible term suggestions o ered by the oracle. We believe RMAP is fast enough to be integrated into present day Internet search engines: RMAP computes 100 term suggestions for a 160,000 document collection in 15 ms on a low-end PC.

[1]  Mark A. Sheldon,et al.  A Content Routing System for Distributed Information Servers , 1993 .

[2]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[3]  Hiroshi Nakagawa,et al.  Concept Based Query Expansion , 1997 .

[4]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[5]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[6]  Efthimis N. Efthimiadis,et al.  A user-centred evaluation of ranking algorithms for interactive query expansion , 1993, SIGIR.

[7]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[8]  Edward A. Fox,et al.  Advanced feedback methods in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[9]  Edward A. Fox,et al.  Boolean Query Formulation with Relevance Feedback , 1983 .

[10]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[11]  David K. Gifford,et al.  Polychannel systems for mass digital communications , 1990, Commun. ACM.

[12]  W. Bruce Croft,et al.  Providing Government Information on the Internet: Experiences with THOMAS , 1995, DL.

[13]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[14]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[15]  Mark A. Sheldon Content routing: a scalable architecture for network-based information discovery , 1995 .

[16]  Andrzej Duda,et al.  Discover: A Resource Discovery System Based on Content Routing , 1995, Comput. Networks ISDN Syst..

[17]  Andrzej Duda,et al.  Content routing in a network of WAIS servers , 1994, 14th International Conference on Distributed Computing Systems.

[18]  Mark A. Sheldon,et al.  Content Routing for Distributed Information Servers , 1994, EDBT.

[19]  David K. Gifford,et al.  An Architecture for Large Scale Information Systems , 1985, SOSP.

[20]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[21]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[22]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[23]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.