Federated Multi-Armed Bandits

Federated multi-armed bandits (FMAB) is a new bandit paradigm that parallels the federated learning (FL) framework in supervised learning. It is inspired by practical applications in cognitive radio and recommender systems, and enjoys features that are analogous to FL. This paper proposes a general framework of FMAB and then studies two specific federated bandit models. We first study the approximate model where the heterogeneous local models are random realizations of the global model from an unknown distribution. This model introduces a new uncertainty of client sampling, as the global model may not be reliably learned even if the finite local models are perfectly known. Furthermore, this uncertainty cannot be quantified a priori without knowledge of the suboptimality gap. We solve the approximate model by proposing Federated Double UCB (Fed2-UCB), which constructs a novel “double UCB” principle accounting for uncertainties from both arm and client sampling. We show that gradually admitting new clients is critical in achieving an O(log(T )) regret while explicitly considering the communication cost. The exact model, where the global bandit model is the exact average of heterogeneous local models, is then studied as a special case. We show that, somewhat surprisingly, the order-optimal regret can be achieved independent of the number of clients with a careful choice of the update periodicity. Experiments using both synthetic and real-world datasets corroborate the theoretical analysis and demonstrate the effectiveness and efficiency of the proposed algorithms.

[1]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[2]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[3]  Vianney Perchet,et al.  A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players , 2019, AISTATS.

[4]  Cong Shen,et al.  Federated Multi-armed Bandits with Personalization , 2021, AISTATS.

[5]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[6]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[7]  Ji Liu,et al.  Federated Bandit: A Gossiping Approach , 2020, Proc. ACM Meas. Anal. Comput. Syst..

[8]  Min-hwan Oh,et al.  Thompson Sampling for Multinomial Logit Contextual Bandits , 2019, NeurIPS.

[9]  Shuai Li,et al.  Fast distributed bandits for online recommendation systems , 2020, ICS.

[10]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[11]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[12]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[13]  Mihaela van der Schaar,et al.  SDF-Bayes: Cautious Optimism in Safe Dose-Finding Clinical Trials with Drug Combinations and Heterogeneous Patient Groups , 2021, AISTATS.

[14]  Zhiyang Wang,et al.  Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints , 2020, ICML.

[15]  Christina Fragouli,et al.  Federated Recommendation System via Differential Privacy , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[16]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[17]  Aryan Mokhtari,et al.  FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.

[18]  Shahin Shahrampour,et al.  Multi-armed bandits in multi-agent networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Liangjie Hong,et al.  Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems , 2017, CIKM.

[20]  Daguang Xu,et al.  Privacy-preserving Federated Brain Tumour Segmentation , 2019, MLMI@MICCAI.

[21]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[22]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[23]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[24]  Cong Shen,et al.  On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications , 2021, IEEE Journal on Selected Areas in Information Theory.

[25]  Vianney Perchet,et al.  SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits , 2018, NeurIPS.

[26]  Venugopal V. Veeravalli,et al.  Multi-User Multi-Armed Bandits for Uncoordinated Spectrum Access , 2018, 2019 International Conference on Computing, Networking and Communications (ICNC).

[27]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[28]  Jing Yang,et al.  Decentralized Multi-player Multi-armed Bandits with No Collision Information , 2020, AISTATS.

[29]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[30]  Mihaela van der Schaar,et al.  Contextual Constrained Learning for Dose-Finding Clinical Trials , 2020, AISTATS.

[31]  Kuan Eeik Tan,et al.  Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System , 2019, ArXiv.

[32]  Vaibhav Srivastava,et al.  Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[33]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[34]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[35]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[36]  Chen-Yu Wei,et al.  Federated Residual Learning , 2020, ArXiv.

[37]  Yuval Peres,et al.  Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without , 2020, COLT.

[38]  Liwei Wang,et al.  Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication , 2019, ICLR.

[39]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[40]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[41]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[42]  Abhimanyu Dubey,et al.  Differentially-Private Federated Linear Bandits , 2020, NeurIPS.

[43]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[44]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[45]  Vaibhav Srivastava,et al.  On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).

[46]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[47]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.