论文信息 - Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON – Batched Exploration with Adaptive COmmunicatioN – that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.

[1] Wei Chen,et al. Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[2] S'ebastien Bubeck,et al. Coordination without communication: optimal regret in two players multi-armed bandits , 2020, COLT.

[3] Shie Mannor,et al. Multi-user lax communications: A multi-armed bandit approach , 2015, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[4] Vianney Perchet,et al. Selfish Robustness and Equilibria in Multi-Player Bandits , 2020, COLT.

[5] J. Munkres. ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[6] Naumaan Nayyar,et al. Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[7] Ananthram Swami,et al. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[8] Andreas Krause,et al. Multi-Player Bandits: The Adversarial Case , 2019, J. Mach. Learn. Res..

[9] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[10] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.

[11] Vianney Perchet,et al. Combinatorial semi-bandit with known covariance , 2016, NIPS.

[12] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[13] Vianney Perchet,et al. A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players , 2019, AISTATS.

[14] Sumit J Darak,et al. Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks , 2018, IEEE Journal on Selected Areas in Communications.

[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16] Mark Sellke,et al. Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions , 2020, COLT.

[17] Lilian Besson,et al. {Multi-Player Bandits Revisited} , 2017, ALT.

[18] Gábor Lugosi,et al. Multiplayer bandits without observing collision information , 2018, Math. Oper. Res..

[19] Harshvardhan Tibrewal,et al. Multiplayer Multi-armed Bandits for Optimal Assignment in Heterogeneous Networks , 2019 .

[20] Cong Shen,et al. On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications , 2021, IEEE Journal on Selected Areas in Information Theory.

[21] Jing Yang,et al. Decentralized Multi-player Multi-armed Bandits with No Collision Information , 2020, AISTATS.

[22] Andrea J. Goldsmith,et al. Adaptive coded modulation for fading channels , 1997, Proceedings of ICC'97 - International Conference on Communications.

[23] Naumaan Nayyar,et al. On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits , 2015, IEEE Transactions on Control of Network Systems.

[24] Abbas Jamalipour,et al. Wireless communications , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[25] Tavor Z. Baharav,et al. My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits , 2020, ICML.

[26] Amir Leshem,et al. Game of Thrones: Fully Distributed Learning for Multiplayer Bandits , 2018, Math. Oper. Res..

[27] Ao Tang,et al. Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[28] Yuval Peres,et al. Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without , 2020, COLT.

[29] Shie Mannor,et al. Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[30] Eshcar Hillel,et al. Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.

[31] Amir Leshem,et al. Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[32] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[33] Wei Chen,et al. Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications , 2017, NIPS.

[34] Zheng Wen,et al. Combinatorial Cascading Bandits , 2015, NIPS.

[35] Yi Gai,et al. Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[36] Zheng Wen,et al. Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[37] Vijay V. Vazirani,et al. Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[38] Vianney Perchet,et al. SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits , 2018, NeurIPS.

[39] Yajun Wang,et al. Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[40] Wei Chen,et al. Thompson Sampling for Combinatorial Semi-Bandits , 2018, ICML.

[41] Venugopal V. Veeravalli,et al. Multi-User MABs with User Dependent Rewards for Uncoordinated Spectrum Access , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[42] Kaito Ariu,et al. Optimal Algorithms for Multiplayer Multi-Armed Bandits , 2019, AISTATS.

[43] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[44] Jean C. Walrand,et al. Fair end-to-end window-based congestion control , 2000, TNET.

[45] Ohad Shamir,et al. Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[46] Vianney Perchet,et al. Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits , 2020, NeurIPS.

[47] Shie Mannor,et al. Tight Lower Bounds for Combinatorial Multi-Armed Bandits , 2020, COLT.