Linear Submodular Bandits and their Application to Diversified Retrieval

Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems. In this paper, we propose the linear sub-modular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models for diversified retrieval. We present an algorithm, called LSBGREEDY, and prove that it efficiently converges to a near-optimal model. As a case study, we applied our approach to the setting of personalized news recommendation, where the system must recommend small sets of news articles selected from tens of thousands of available articles each day. In a live user study, we found that LSBGREEDY significantly outperforms existing online learning approaches.

[1]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[2]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[3]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[4]  Simon Regard,et al.  ["Less is more"]. , 2013, Revue medicale suisse.

[5]  Balaji Padmanabhan,et al.  SCENE: a scalable two-stage personalized news recommendation system , 2011, SIGIR.

[6]  Andreas Krause,et al.  Online Learning of Assignments , 2009, NIPS.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[9]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[10]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[11]  Csaba Szepesvári,et al.  Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.

[12]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[13]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[14]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[15]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[16]  Matthew J. Streeter,et al.  An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.

[17]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[18]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[19]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[20]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[21]  Darko Kirovski,et al.  Essential Pages , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[22]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[23]  Robert E. Schapire,et al.  Non-Stochastic Bandit Slate Problems , 2010, NIPS.

[24]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[25]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[26]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.