Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval

This paper concerns document ranking in information retrieval. In information retrieval systems, the widely accepted probability ranking principle (PRP) suggests that, for optimal retrieval, documents should be ranked in order of decreasing probability of relevance. In this paper, we present a new document ranking paradigm, arguing that a better, more general solution is to optimize top-n ranked documents as a whole, rather than ranking them independently. Inspired by the Modern Portfolio Theory in finance, we quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance; the latter serves as a measure of risk, which was rarely studied for document ranking in the past. Through the analysis of the mean and variance, we show that an optimal rank order is the one that maximizes the overall relevance (mean) of the ranked list at a given risk level (variance). Based on this principle, we then derive an efficient document ranking algorithm. It extends the PRP by considering both the uncertainty of relevance predictions and correlations between retrieved documents. Furthermore, we quantify the benefits of diversification, and theoretically show that diversifying documents is an effective way to reduce the risk of document ranking. Experimental results on the collaborative filtering problem confirms the theoretical insights with improved recommendation performance, e.g., achieved over 300% performance gain over the PRP-based ranking on the user-based recommendation.

[1]  A. Zellner Bayesian Estimation and Prediction Using Asymmetric Loss Functions , 1986 .

[2]  Michael D. Gordon,et al.  A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[3]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[4]  Nicholas J. Belkin,et al.  Ranking in Principle , 1978, J. Documentation.

[5]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[6]  S. Robertson The probability ranking principle in IR , 1997 .

[7]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[8]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[9]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[10]  W. Sharpe A Simplified Model for Portfolio Analysis , 1963 .

[11]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[12]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[13]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[16]  Marcel J. T. Reinders,et al.  Probabilistic Relevance Models for Collaborative Filtering , 2006, SIGIR 2006.

[17]  Luo Si,et al.  A study of mixture models for collaborative filtering , 2006, Information Retrieval.

[18]  Stephen E. Robertson,et al.  Probabilistic relevance ranking for collaborative filtering , 2008, Information Retrieval.

[19]  Jonathan L. Herlocker,et al.  A collaborative filtering algorithm and evaluation metric that accurately model the user experience , 2004, SIGIR '04.

[20]  Michael D. Gordon,et al.  A Utility Theoretic Examination of the Probability Ranking Principle in Information Retrieval. , 1991 .

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.