Term Proximity Scoring for Keyword-Based Retrieval Systems

This paper suggests the use of proximity measurement in combination with the Okapi probabilistic model. First, using the Okapi system, our investigation was carried out in a distributed retrieval framework to calculate the same relevance score as that achieved by a single centralized index. Second, by applying a term-proximity scoring heuristic to the top documents returned by a keyword-based system, our aim is to enhance retrieval performance. Our experiments were conducted using the TREC8, TREC9 and TREC10 test collections, and show that the suggested approach is stable and generally tends to improve retrieval effectiveness especially at the top documents retrieved.

[1]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[2]  James Allan,et al.  Document classification using multiword features , 1998, CIKM '98.

[3]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[4]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .

[5]  Natasa Milic-Frayling,et al.  CLARIT TREC-4 Experiments , 1995, TREC.

[6]  David Hawking,et al.  Result merging strategies for a current news metasearcher , 2003, Inf. Process. Manag..

[7]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[8]  Joel L. Fagan,et al.  The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval , 1989, JASIS.

[9]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[10]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[11]  Hinrich Schütze,et al.  Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks , 1996, TREC.

[12]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[13]  Amit Singhal,et al.  A case study in web search using TREC algorithms , 2001, WWW '01.

[14]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[15]  Jacques Savoy,et al.  Approaches to collection selection and results merging for distributed information retrieval , 2001, CIKM '01.

[16]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[17]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[18]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[19]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  WalkerS.,et al.  Experimentation as a way of life , 2000 .

[22]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[23]  David Hawking,et al.  Proximity Operators - So Near And Yet So Far , 1995, TREC.

[24]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[25]  David Hawking,et al.  Methods for information server selection , 1999, TOIS.

[26]  David Carmel,et al.  Juru at TREC 10 - Experiments with Index Pruning , 2001, TREC.

[27]  Jacques Savoy,et al.  Report on the TREC-10 Experiment: Distributed Collections and Entrypage Searching , 2001, TREC.

[28]  Chris Buckley,et al.  Using Query Zoning and Correlation Within SMART: TREC 5 , 1996, TREC.

[29]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[30]  Avi Arampatzis,et al.  Linguistically Motivated Information Retrieval , 2000 .

[31]  James Allan,et al.  Recent Experiments with INQUERY , 1995, TREC.

[32]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.