Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.

[1]  E. Kaufmann,et al.  Near-Optimal Collaborative Learning in Bandits , 2022, NeurIPS.

[2]  S. Shakkottai,et al.  Robust Multi-Agent Bandits Over Undirected Graphs , 2022, Proc. ACM Meas. Anal. Comput. Syst..

[3]  Udari Madhushani,et al.  When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits , 2021, ArXiv.

[4]  Henry W. J. Reeve,et al.  Asymptotic Optimality for Decentralised Bandits , 2021, Dynamic Games and Applications.

[5]  Ji Liu,et al.  Federated Bandit: A Gossiping Approach , 2020, Proc. ACM Meas. Anal. Comput. Syst..

[6]  Abhimanyu Dubey,et al.  Differentially-Private Federated Linear Bandits , 2020, NeurIPS.

[7]  A. Goldsmith,et al.  Bayesian Algorithms for Decentralized Stochastic Bandits , 2020, IEEE Journal on Selected Areas in Information Theory.

[8]  Abhimanyu Dubey,et al.  Kernel Methods for Cooperative Multi-Agent Contextual Bandits , 2020, ICML.

[9]  Sanjay Shakkottai,et al.  Robust Multi-Agent Multi-Armed Bandits , 2020, MobiHoc.

[10]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[11]  S. Shakkottai,et al.  Multiagent Low-Dimensional Linear Bandits , 2020, IEEE Transactions on Automatic Control.

[12]  Sanjay Shakkottai,et al.  The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits , 2020, AISTATS.

[13]  S. Shakkottai,et al.  Social Learning in Multi Agent Multi Armed Bandits , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[14]  Kaito Ariu,et al.  Optimal Algorithms for Multiplayer Multi-Armed Bandits , 2019, AISTATS.

[15]  Michael I. Jordan,et al.  Competing Bandits in Matching Markets , 2019, AISTATS.

[16]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[17]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[18]  Vianney Perchet,et al.  SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits , 2018, NeurIPS.

[19]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[20]  Shahin Shahrampour,et al.  Multi-armed bandits in multi-agent networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Yishay Mansour,et al.  Competing Bandits: Learning Under Competition , 2017, ITCS.

[22]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[23]  Vaibhav Srivastava,et al.  Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[24]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[25]  Aditya Gopalan,et al.  Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  Li Zhang,et al.  Information sharing in distributed stochastic bandits , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[27]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[28]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[29]  Eshcar Hillel,et al.  Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.

[30]  Mihaela van der Schaar,et al.  Distributed Online Learning via Cooperative Contextual Bandits , 2013, IEEE Transactions on Signal Processing.

[31]  István Hegedüs,et al.  Gossip-based distributed stochastic bandit algorithms , 2013, ICML.

[32]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[33]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[34]  Silvio Lattanzi,et al.  Almost tight bounds for rumour spreading with conductance , 2010, STOC '10.

[35]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[36]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[37]  Filip Radlinski,et al.  Mortal Multi-Armed Bandits , 2008, NIPS.

[38]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.