Positional relevance model for pseudo-relevance feedback

Pseudo-relevance feedback is an effective technique for improving retrieval results. Traditional feedback algorithms use a whole feedback document as a unit to extract words for query expansion, which is not optimal as a document may cover several different topics and thus contain much irrelevant information. In this paper, we study how to effectively select from feedback documents those words that are focused on the query topic based on positions of terms in feedback documents. We propose a positional relevance model (PRM) to address this problem in a unified probabilistic way. The proposed PRM is an extension of the relevance model to exploit term positions and proximity so as to assign more weights to words closer to query words based on the intuition that words closer to query words are more likely to be related to the query topic. We develop two methods to estimate PRM based on different sampling processes. Experiment results on two large retrieval datasets show that the proposed PRM is effective and robust for pseudo-relevance feedback, significantly outperforming the relevance model in both document-based feedback and passage-based feedback.

[1]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[2]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[3]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[4]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[5]  Ronan Cummins,et al.  Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[6]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[7]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[8]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[9]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[10]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[11]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[12]  Andreas Dengel,et al.  Query expansion using gaze-based feedback on the subdocument level , 2008, SIGIR '08.

[13]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001 .

[14]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[15]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[16]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[17]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.

[18]  E. Michael Keen,et al.  The Use of Term position Devices in Ranked output Experiments , 1991, J. Documentation.

[19]  Charles L. A. Clarke,et al.  Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval , 2005, TREC.

[20]  David Hawking,et al.  Proximity Operators - So Near And Yet So Far , 1995, TREC.

[21]  ChengXiang Zhai,et al.  A comparative study of methods for estimating query language models with pseudo feedback , 2009, CIKM.

[22]  Charles L. A. Clarke,et al.  Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[23]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[24]  James Allan,et al.  Relevance feedback with too much data , 1995, SIGIR '95.

[25]  Jinglei Zhao,et al.  A proximity language model for information retrieval , 2009, SIGIR.

[26]  Christof Monz Minimal Span Weighting Retrieval for Question Answering , 2004 .

[27]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[28]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[29]  Ying Wang,et al.  A study of the effect of term proximity on query expansion , 2006, J. Inf. Sci..

[30]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[31]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[32]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[33]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[34]  E. Michael Keen Some aspects of proximity searching in text retrieval systems , 1992, J. Inf. Sci..