论文信息 - Coordination without communication: optimal regret in two players multi-armed bandits

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

S'ebastien Bubeck | Thomas Budzinski | Sébastien Bubeck | Thomas Budzinski

[1] Ohad Shamir,et al. Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[2] Vianney Perchet,et al. SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits , 2018, NeurIPS.

[3] Hai Jiang,et al. Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[4] Ananthram Swami,et al. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[5] Gábor Lugosi,et al. Multiplayer bandits without observing collision information , 2018, Math. Oper. Res..

[6] Yuval Peres,et al. Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without , 2020, COLT.

[7] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[8] Jacques Palicot,et al. Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings , 2017, CrownCom.

[9] Shie Mannor,et al. Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[10] Andreas Krause,et al. Multi-Player Bandits: The Adversarial Case , 2019, J. Mach. Learn. Res..