Federated Bandit: A Gossiping Approach

In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed Bandit problem with a set of $N$ agents, who can only communicate their local data with neighbors described by a connected graph $G$. Each agent makes a sequence of decisions on selecting an arm from $M$ candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \texttt{Gossip\_UCB}, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \texttt{Gossip\_UCB} successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of $O(\max\{ \texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{\lambda_2^{-1}} N\})$ for all $N$ agents, where $\lambda_2\in(0,1)$ is the second largest eigenvalue of the expected gossip matrix, which is a function of $G$. We then propose \texttt{Fed\_UCB}, a differentially private version of \texttt{Gossip\_UCB}, in which the agents preserve $\epsilon$-differential privacy of their local data while achieving $O(\max \{\frac{\texttt{poly}(N,M)}{\epsilon}\log^{2.5} T, \texttt{poly}(N,M) (\log_{\lambda_2^{-1}} N + \log T) \})$ regret.

[1]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[2]  Sanjay Shakkottai,et al.  Social Learning in Multi Agent Multi Armed Bandits , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[3]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[4]  Abhimanyu Dubey,et al.  Cooperative Multi-Agent Bandits with Heavy Tails , 2020, ICML.

[5]  M. Cao,et al.  A Lower Bound on Convergence of a Distributed Network Consensus Algorithm , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[6]  Xiaojing Ye,et al.  Decentralized Consensus Algorithm with Delayed and Stochastic Gradients , 2016, SIAM J. Optim..

[7]  Mohammad Malekzadeh,et al.  Privacy-Preserving Bandits , 2020, MLSys.

[8]  Abhimanyu Dubey,et al.  Kernel Methods for Cooperative Multi-Agent Contextual Bandits , 2020, ICML.

[9]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Mingyan Liu,et al.  An Online Learning Approach to Improving the Quality of Crowd-Sourcing , 2015, SIGMETRICS.

[12]  Naumaan Nayyar,et al.  On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits , 2015, IEEE Transactions on Control of Network Systems.

[13]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[14]  Luc Moreau,et al.  Stability of multiagent systems with time-dependent communication links , 2005, IEEE Transactions on Automatic Control.

[15]  Tamer Basar,et al.  Differentially Private Gossip Gradient Descent , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[16]  Ronitt Rubinfeld,et al.  Fast Local Computation Algorithms , 2011, ICS.

[17]  An Online Learning Approach to Improving the Quality of Crowd-Sourcing , 2015, SIGMETRICS 2015.

[18]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[19]  Christos Dimitrakakis,et al.  Differentially private, multi-agent multi-armed bandits , 2015, EWRL 2015.

[20]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[21]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[22]  Nikita Mishra,et al.  Private Stochastic Multi-arm Bandits: From Theory to Practice , 2014 .

[23]  Richard Combes,et al.  Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[24]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[25]  Andrea Montanari,et al.  Gossip PCA , 2011, SIGMETRICS.

[26]  Christina Fragouli,et al.  Federated Recommendation System via Differential Privacy , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[27]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[28]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, ICALP.

[29]  R. Srikant,et al.  Learning to Control Renewal Processes with Bandit Feedback , 2019, Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[30]  Christos Dimitrakakis,et al.  Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[31]  Abhimanyu Dubey,et al.  Private and Byzantine-Proof Cooperative Decision-Making , 2022, AAMAS.

[32]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[33]  Rong-Rong Chen,et al.  Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis , 2020, ArXiv.

[34]  Liwei Wang,et al.  Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication , 2019, ICLR.

[35]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[36]  Randal W. Beard,et al.  Consensus seeking in multiagent systems under dynamically changing interaction topologies , 2005, IEEE Transactions on Automatic Control.

[37]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[38]  Lili Su,et al.  Securing Distributed Gradient Descent in High Dimensional Statistical Learning , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[39]  István Hegedüs,et al.  Gossip-based distributed stochastic bandit algorithms , 2013, ICML.

[40]  Ji Liu,et al.  A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards* , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[41]  R. Srikant,et al.  Quantized Consensus , 2006, 2006 IEEE International Symposium on Information Theory.

[42]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[43]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[44]  Behrouz Touri,et al.  Product of Random Stochastic Matrices , 2011, IEEE Transactions on Automatic Control.

[45]  Lili Su,et al.  Securing Distributed Gradient Descent in High Dimensional Statistical Learning , 2019, SIGMETRICS.

[46]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[47]  Joelle Pineau,et al.  Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning , 2019, NeurIPS.

[48]  Naomi Ehrich Leonard,et al.  Heterogeneous Explore-Exploit Strategies on Multi-Star Networks , 2021, IEEE Control Systems Letters.

[49]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[50]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[51]  Sarvar Patel,et al.  Practical Secure Aggregation for Federated Learning on User-Held Data , 2016, ArXiv.

[52]  István Hegedüs,et al.  Gossip Learning as a Decentralized Alternative to Federated Learning , 2019, DAIS.

[53]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[54]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[55]  Abhimanyu Dubey,et al.  Differentially-Private Federated Linear Bandits , 2020, NeurIPS.

[56]  Jie Lin,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[57]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[58]  Bingsheng He,et al.  A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection , 2019, IEEE Transactions on Knowledge and Data Engineering.

[59]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[60]  J. Suykens,et al.  Gossip Algorithms for Computing U-statistics , 2009 .

[61]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[62]  John N. Tsitsiklis,et al.  Convergence Speed in Distributed Consensus and Averaging , 2009, SIAM J. Control. Optim..

[63]  Vaibhav Srivastava,et al.  Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[64]  Adam Wierman,et al.  Logarithmic Communication for Distributed Optimization in Multi-Agent Systems , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[65]  Stéphan Clémençon,et al.  Extending Gossip Algorithms to Distributed Estimation of U-statistics , 2015, NIPS.

[66]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.