Multiagent Low-Dimensional Linear Bandits

We study a multiagent stochastic linear bandit with side information, parameterized by an unknown vector <inline-formula><tex-math notation="LaTeX">$\theta ^* \in \mathbb {R}^{d}$</tex-math></inline-formula>. The side information consists of a finite collection of low-dimensional subspaces, one of which contains <inline-formula><tex-math notation="LaTeX">$\theta ^*$</tex-math></inline-formula>. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other and each agent plays a projected variant of LinUCB on the corresponding (low dimensional) subspace. By distributing the search for the optimal subspace across users and learning of the unknown vector by each agent in the corresponding low-dimensional subspace, we show that the per-agent finite-time regret is much smaller than the case when agents do not communicate. We finally complement these results through simulations.

[1]  Christos Thrampoulidis,et al.  Decentralized Multi-Agent Linear Bandits with Safety Constraints , 2020, AAAI.

[2]  Abhimanyu Dubey,et al.  Differentially-Private Federated Linear Bandits , 2020, NeurIPS.

[3]  Sanjay Shakkottai,et al.  Robust Multi-Agent Multi-Armed Bandits , 2020, MobiHoc.

[4]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[5]  Sanjay Shakkottai,et al.  The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits , 2020, AISTATS.

[6]  S. Shakkottai,et al.  Social Learning in Multi Agent Multi Armed Bandits , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[7]  Kaito Ariu,et al.  Optimal Algorithms for Multiplayer Multi-Armed Bandits , 2019, AISTATS.

[8]  Liwei Wang,et al.  Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication , 2019, ICLR.

[9]  Babak Hassibi,et al.  Stochastic Linear Bandits with Hidden Low Rank Structure , 2019, ArXiv.

[10]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[11]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[12]  Vianney Perchet,et al.  SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits , 2018, NeurIPS.

[13]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[14]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[15]  John Langford,et al.  Making Contextual Decisions with Low Technical Debt , 2016 .

[16]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[17]  Aditya Gopalan,et al.  Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Li Zhang,et al.  Information sharing in distributed stochastic bandits , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[19]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[20]  Shie Mannor,et al.  Multi-user lax communications: A multi-armed bandit approach , 2015, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[21]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[22]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[23]  Eshcar Hillel,et al.  Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.

[24]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[25]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[26]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[27]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[28]  Sébastien Gerchinovitz,et al.  Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.

[29]  Silvio Lattanzi,et al.  Almost tight bounds for rumour spreading with conductance , 2010, STOC '10.

[30]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[31]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[32]  Filip Radlinski,et al.  Mortal Multi-Armed Bandits , 2008, NIPS.

[33]  Ambuj Tewari,et al.  From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.

[34]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.