Query association surrogates for Web search

Collection sizes, query rates, and the number of users of Web search engines are increasing. Therefore, there is continued demand for innovation in providing search services that meet user information needs. In this article, we propose new techniques to add additional terms to documents with the goal of providing more accurate searches. Our techniques are based on query association, where queries are stored with documents that are highly similar statistically. We show that adding query associations to documents improves the accuracy of Web topic finding searches by up to 7%, and provides an excellent complement to existing supplement techniques for site finding. We conclude that using document surrogates derived from query association is a valuable new technique for accurate Web searching.

[1]  Susan T. Dumais,et al.  Statistical semantics: How can a computer use what people name things to guess what things people mean when they name things? , 1982, CHI '82.

[2]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[3]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[4]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[5]  George W. Furnas,et al.  Experience with an adaptive indexing scheme , 1985, CHI '85.

[6]  David Hawking,et al.  Overview of TREC-7 Very Large Collection Track , 1997, TREC.

[7]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[8]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[9]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[10]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[11]  David J. Harper,et al.  A language modelling approach to relevance profiling for document browsing , 2002, JCDL '02.

[12]  Ophir Frieder,et al.  Document normalization revisited , 2002, SIGIR '02.

[13]  Aravindan Veerasamy,et al.  Effectiveness of a graphical display of retrieval results , 1997, SIGIR '97.

[14]  Wallace Koehler,et al.  Web page change and persistence - A four-year longitudinal study , 2002, J. Assoc. Inf. Sci. Technol..

[15]  Vijay V. Raghavan,et al.  On the reuse of past optimal queries , 1995, SIGIR '95.

[16]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[17]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[18]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[19]  Larry Fitzpatrick,et al.  Automatic feedback using past queries: social searching? , 1997, SIGIR '97.

[20]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[21]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[22]  Stephen E. Robertson,et al.  Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC , 1995, Inf. Process. Manag..

[23]  W. J. Hutchins The concept of “aboutness” in subject indexing , 1997 .

[24]  Donna K. Harman,et al.  Overview of the Ninth Text REtrieval Conference (TREC-9) , 2000, TREC.

[25]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[26]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[27]  Kalervo Järvelin,et al.  Employing the resolution power of search keys , 2001 .

[28]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[29]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[30]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[31]  Kam-Fai Wong,et al.  Aboutness from a commonsense perspective , 2000 .

[32]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[33]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .

[34]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[35]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[36]  Hugh E. Williams,et al.  Query association for effective retrieval , 2002, CIKM '02.