Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a d-regular graph. Every edge in the graph has probabilistic weight p to account for the (1 − p) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability p. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.

[1]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[2]  Romain Laroche,et al.  Decentralized Exploration in Multi-Armed Bandits , 2018, ICML.

[3]  Naomi Ehrich Leonard,et al.  A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem , 2020, 2020 European Control Conference (ECC).

[4]  W. Marsden I and J , 2012 .

[5]  Alessandro Lazaric,et al.  Fighting Boredom in Recommender Systems with Linear Reinforcement Learning , 2018, NeurIPS.

[6]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[7]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[8]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Tamar Keasar,et al.  Bees in two-armed bandit situations: foraging choices and possible decision mechanisms , 2002 .

[11]  Vaibhav Srivastava,et al.  Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits , 2020, Autom..

[12]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[13]  Naomi Ehrich Leonard,et al.  Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem , 2016, 2019 18th European Control Conference (ECC).

[14]  Christos Dimitrakakis,et al.  Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[15]  Naomi Ehrich Leonard,et al.  Heterogeneous Explore-Exploit Strategies on Multi-Star Networks , 2021, IEEE Control Systems Letters.

[16]  Kaito Ariu,et al.  Optimal Algorithms for Multiplayer Multi-Armed Bandits , 2019, AISTATS.

[17]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[18]  Naomi Ehrich Leonard,et al.  Distributed Learning: Sequential Decision Making in Resource-Constrained Environments , 2020, ArXiv.

[19]  Vaibhav Srivastava,et al.  Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[20]  Aditya Gopalan,et al.  Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[22]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[23]  Joelle Pineau,et al.  Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis , 2018, MLHC.

[24]  Iain D. Couzin,et al.  Signalling and the Evolution of Cooperative Foraging in Dynamic Environments , 2011, PLoS Comput. Biol..

[25]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[26]  H. Robbins Some aspects of the sequential design of experiments , 1952 .