Learning diverse rankings with multi-armed bandits

Algorithms for learning to rank Web documents usually assume a document's relevance is independent of other documents. This leads to learned ranking functions that produce rankings with redundant results. In contrast, user studies have shown that diversity at high ranks is often preferred. We present two online learning algorithms that directly learn a diverse ranking of documents based on users' clicking behavior. We show that these algorithms minimize abandonment, or alternatively, maximize the probability that a relevant document is found in the top k positions of a ranking. Moreover, one of our algorithms asymptotically achieves optimal worst-case performance even if users' interests change.

[1]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[2]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[3]  D. Aldous Exchangeability and related topics , 1985 .

[4]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[6]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[7]  S. Robertson The probability ranking principle in IR , 1997 .

[8]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[9]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[10]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[11]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  William W. Cohen,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[17]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[18]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[19]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[20]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[21]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[22]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[23]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[24]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[25]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[26]  Susan T. Dumais,et al.  Characterizing the value of personalizing search , 2007, SIGIR.

[27]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[28]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[29]  Quoc V. Le,et al.  Learning to Rank with Non-Smooth Cost Functions , 2007 .

[30]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[31]  Matthew J. Streeter,et al.  An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.