Contextual Bandits in a Collaborative Environment

Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit parameters pertaining to each user independently. This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components. In this paper, we develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithm against several state-of-the-art contextual bandit algorithms.

[1]  Atilla Eryilmaz,et al.  Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.

[2]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[3]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[4]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[5]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[6]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[7]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[8]  H. Vincent Poor,et al.  Bandit problems in networks: Asymptotically efficient distributed allocation rules , 2011, IEEE Conference on Decision and Control and European Control Conference.

[9]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[10]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[11]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[12]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[13]  Alda Lopes Gançarski,et al.  A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System , 2012, ICONIP.

[14]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[15]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[16]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[17]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[18]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[19]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[20]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[21]  Umar Syed,et al.  Graphical Models for Bandit Problems , 2011, UAI.

[22]  Atilla Eryilmaz,et al.  Multi-armed bandits in the presence of side observations in social networks , 2013, 52nd IEEE Conference on Decision and Control.

[23]  Jun Wang,et al.  Interactive collaborative filtering , 2013, CIKM.

[24]  Wei Li,et al.  Exploitation and exploration in a performance based contextual advertising system , 2010, KDD.

[25]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[26]  R. Cialdini,et al.  Social influence: Social norms, conformity and compliance. , 1998 .

[27]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[28]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[29]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.