LiMe: linear methods for pseudo-relevance feedback

Retrieval effectiveness has been traditionally pursued by improving the ranking models and by enriching the pieces of evidence about the information need beyond the original query. A successful method for producing improved rankings consists in expanding the original query. Pseudo-relevance feedback (PRF) has proved to be an effective method for this task in the absence of explicit user's judgements about the initial ranking. This family of techniques obtains expansion terms using the top retrieved documents yielded by the original query. PRF techniques usually exploit the relationship between terms and documents or terms and queries. In this paper, we explore the use of linear methods for pseudo-relevance feedback. We present a novel formulation of the PRF task as a matrix decomposition problem which we called LiMe. This factorisation involves the computation of an inter-term similarity matrix which is used for expanding the original query. We use linear least squares regression with regularisation to solve the proposed decomposition with non-negativity constraints. We compare LiMe on five datasets against strong state-of-the-art baselines for PRF showing that our novel proposal achieves improvements in terms of MAP, nDCG and robustness index.

[1]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[2]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[3]  George Karypis,et al.  SLIM: Sparse Linear Methods for Top-N Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining.

[4]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[5]  Craig MacDonald,et al.  From Puppy to Maturity: Experiences in Developing Terrier , 2012, OSIR@SIGIR.

[6]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[7]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[8]  Pushpak Bhattacharyya,et al.  On Improving Pseudo-Relevance Feedback Using Pseudo-Irrelevant Documents , 2010, ECIR.

[9]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[10]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[11]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[12]  ChengXiang Zhai,et al.  A comparative study of methods for estimating query language models with pseudo feedback , 2009, CIKM.

[13]  ChengXiang Zhai,et al.  Revisiting the Divergence Minimization Feedback Model , 2014, CIKM.

[14]  Tetsuya Sakai,et al.  Flexible pseudo-relevance feedback via selective sampling , 2005, TALIP.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[17]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[18]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[19]  ChengXiang Zhai,et al.  Estimation of statistical translation models based on mutual information for ad hoc information retrieval , 2010, SIGIR.

[20]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[21]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[22]  Alvaro Barreiro,et al.  Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation , 2016, ECIR.

[23]  W. Bruce Croft,et al.  Geometric representations for multiple documents , 2010, SIGIR.

[24]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[25]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[26]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[27]  Azadeh Shakery,et al.  Pseudo-Relevance Feedback Based on Matrix Factorization , 2016, CIKM.

[28]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[29]  Alvaro Barreiro,et al.  Score distributions for Pseudo Relevance Feedback , 2014, Inf. Sci..

[30]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[31]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  Kevyn Collins-Thompson,et al.  Estimation and use of uncertainty in pseudo-relevance feedback , 2007, SIGIR.

[34]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[35]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[36]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[37]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[38]  Alejandro Bellogín,et al.  Relevance-based language modelling for recommender systems , 2013, Inf. Process. Manag..

[39]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[40]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[41]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[42]  Alvaro Barreiro,et al.  Promoting Divergent Terms in the Estimation of Relevance Models , 2011, ICTIR.

[43]  A. E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems.: Biased Estimation for Nonorthogonal Problems. , 2000 .