Portfolio theory of information retrieval

This paper studies document ranking under uncertainty. It is tackled in a general situation where the relevance predictions of individual documents have uncertainty, and are dependent between each other. Inspired by the Modern Portfolio Theory, an economic theory dealing with investment in financial markets, we argue that ranking under uncertainty is not just about picking individual relevant documents, but about choosing the right combination of relevant documents. This motivates us to quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance; the latter serves as a measure of risk, which was rarely studied for document ranking in the past. Through the analysis of the mean and variance, we show that an optimal rank order is the one that balancing the overall relevance (mean) of the ranked list against its risk level (variance). Based on this principle, we then derive an efficient document ranking algorithm. It generalizes the well-known probability ranking principle (PRP) by considering both the uncertainty of relevance predictions and correlations between retrieved documents. Moreover, the benefit of diversification is mathematically quantified; we show that diversifying documents is an effective way to reduce the risk of document ranking. Experimental results in text retrieval confirm performance.

[1]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[2]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[3]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[4]  Stephen Tomlinson Early precision measures: implications from the downside of blind feedback , 2006, SIGIR '06.

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  A. Zellner Bayesian Estimation and Prediction Using Asymmetric Loss Functions , 1986 .

[8]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[9]  W. Bruce Croft,et al.  Indri at TREC 2005: Terabyte Track , 2005, TREC.

[10]  Ingemar J. Cox,et al.  Risky business: modeling and exploiting uncertainty in information retrieval , 2009, SIGIR.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Falk Scholer,et al.  A comparison of evaluation measures given how users perform on search tasks , 2007 .

[13]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[14]  S. Robertson The probability ranking principle in IR , 1997 .

[15]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[16]  Nicholas J. Belkin,et al.  Ranking in Principle , 1978, J. Documentation.

[17]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[18]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[19]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[22]  Jun Wang,et al.  Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval , 2009, ECIR.

[23]  Michael D. Gordon,et al.  A Utility Theoretic Examination of the Probability Ranking Principle in Information Retrieval. , 1991 .

[24]  Yun Zhou,et al.  Indri at TREC 2005: Terabyte Track (Notebook Version) , 2005 .

[25]  E. Elton Modern portfolio theory and investment analysis , 1981 .