论文信息 - Distributed Bandits: Probabilistic Communication on d-regular Graphs

Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a d-regular graph. Every edge in the graph has probabilistic weight p to account for the (1 − p) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability p. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.

Naomi Ehrich Leonard | Udari Madhushani

[1] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .

[2] Romain Laroche,et al. Decentralized Exploration in Multi-Armed Bandits , 2018, ICML.

[3] Naomi Ehrich Leonard,et al. A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem , 2020, 2020 European Control Conference (ECC).

[4] W. Marsden. I and J , 2012 .

[5] Alessandro Lazaric,et al. Fighting Boredom in Recommender Systems with Linear Reinforcement Learning , 2018, NeurIPS.

[6] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[7] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[8] Varun Kanade,et al. Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10] Tamar Keasar,et al. Bees in two-armed bandit situations: foraging choices and possible decision mechanisms , 2002 .

[11] Vaibhav Srivastava,et al. Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits , 2020, Autom..