Selective Term Proximity Scoring Via BP-ANN

When two terms occur together in a document, the probability of a close relationship between them and the document itself is greater if they are in nearby positions. However, ranking functions including term proximity (TP) require larger indexes than traditional document-level indexing, which slows down query processing. Previous studies also show that this technique is not effective for all types of queries. Here we propose a document ranking model which decides for which queries it would be beneficial to use a proximity-based ranking, based on a collection of features of the query. We use a machine learning approach in determining whether utilizing TP will be beneficial. Experiments show that the proposed model returns improved rankings while also reducing the overhead incurred as a result of using TP statistics.

[1]  Panayiotis Bozanis,et al.  Improved retrieval effectiveness by efficient combination of term proximity and zone scoring: A simulation-based evaluation , 2012, Simul. Model. Pract. Theory.

[2]  J. Shane Culpepper,et al.  How Effective are Proximity Scores in Term Dependency Models? , 2014, ADCS '14.

[3]  Ying Wang,et al.  A study of the effect of term proximity on query expansion , 2006, J. Inf. Sci..

[4]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.

[5]  Krysta Marie Svore,et al.  How good is a span of terms?: exploiting proximity to improve web retrieval , 2010, SIGIR.

[6]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[7]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[8]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[9]  W. Bruce Croft,et al.  A Comparison of Retrieval Models using Term Dependencies , 2014, CIKM.

[10]  Jianfeng Gao,et al.  Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations , 2002, SIGIR '02.

[11]  Yong Yu,et al.  Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[12]  Fan Zhang,et al.  Efficient term proximity search with term-pair indexes , 2010, CIKM.

[13]  Leif Azzopardi Query side evaluation: an empirical analysis of effectiveness and effort , 2009, SIGIR.

[14]  Shuming Shi,et al.  Effective top-k computation with term-proximity support , 2009, Inf. Process. Manag..

[15]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[16]  Torsten Suel,et al.  A candidate filtering mechanism for fast top-k query processing on modern cpus , 2013, SIGIR.

[17]  Jimmy J. Lin,et al.  A cascade ranking model for efficient ranked retrieval , 2011, SIGIR.

[18]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[19]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[20]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[21]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[22]  Ronan Cummins,et al.  Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[23]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[24]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.