Learning in a pairwise term-term proximity framework for information retrieval

Traditional ad hoc retrieval models do not take into account the closeness or proximity of terms. Document scores in these models are primarily based on the occurrences or non-occurrences of query-terms considered independently of each other. Intuitively, documents in which query-terms occur closer together should be ranked higher than documents in which the query-terms appear far apart. This paper outlines several term-term proximity measures and develops an intuitive framework in which they can be used to fully model the proximity of all query-terms for a particular topic. As useful proximity functions may be constructed from many proximity measures, we use a learning approach to combine proximity measures to develop a useful proximity function in the framework. An evaluation of the best proximity functions show that there is a significant improvement over the baseline ad hoc retrieval model and over other more recent methods that employ the use of single proximity measures.

[1]  Wagner Meira,et al.  Enhancing the Set-Based Model Using Proximity Information , 2002, SPIRE.

[2]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[3]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[4]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[5]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[6]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[7]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.

[8]  John R. Koza,et al.  Genetic Programming II , 1992 .

[9]  Xin Li,et al.  Investigation of partial query proximity in web search , 2008, WWW.

[10]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[11]  Ronan Cummins,et al.  Evolving local and global weighting schemes in information retrieval , 2006, Information Retrieval.

[12]  Ronan Cummins,et al.  An Axiomatic Study of Learned Term-Weighting Schemes , 2007 .

[13]  Michel Beigbeder,et al.  An information retrieval model using the fuzzy proximity degree of term occurences , 2005, SAC '05.

[14]  Akshi Kumar,et al.  Contextual Proximity Based Term-Weighting for Improved Web Information Retrieval , 2007, KSEM.

[15]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[16]  Jungi Kim,et al.  Exploiting proximity feature in bigram language model for information retrieval , 2008, SIGIR '08.

[17]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[18]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.