Robust Federated Learning in a Heterogeneous Environment

We study a recently proposed large-scale distributed learning paradigm, namely Federated Learning, where the worker machines are end users' own devices. Statistical and computational challenges arise in Federated Learning particularly in the presence of heterogeneous data distribution (i.e., data points on different devices belong to different distributions signifying different clusters) and Byzantine machines (i.e., machines that may behave abnormally, or even exhibit arbitrary and potentially adversarial behavior). To address the aforementioned challenges, first we propose a general statistical model for this problem which takes both the cluster structure of the users and the Byzantine machines into account. Then, leveraging the statistical model, we solve the robust heterogeneous Federated Learning problem \emph{optimally}; in particular our algorithm matches the lower bound on the estimation error in dimension and the number of data points. Furthermore, as a by-product, we prove statistical guarantees for an outlier-robust clustering algorithm, which can be considered as the Lloyd algorithm with robust estimation. Finally, we show via synthetic as well as real data experiments that the estimation error obtained by our proposed algorithm is significantly better than the non-Byzantine-robust algorithms; in particular, we gain at least by 53\% and 33\% for synthetic and real data experiments, respectively, in typical settings.

[1]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  Constantine Caramanis,et al.  Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[3]  Kannan Ramchandran,et al.  Learning Mixtures of Sparse Linear Regressions Using Sparse Graph Codes , 2019, IEEE Transactions on Information Theory.

[4]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[5]  Gregory Valiant,et al.  Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[6]  Ke Chen,et al.  A constant factor approximation algorithm for k-median clustering with outliers , 2008, SODA '08.

[7]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[8]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[9]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[10]  Nikolaos G. Bourbakis,et al.  A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Pranjal Awasthi,et al.  Improved Spectral-Norm Bounds for Clustering , 2012, APPROX-RANDOM.

[12]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[13]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[14]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[15]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[16]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[17]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[18]  Shie Mannor,et al.  Distributed Robust Learning , 2014, ArXiv.

[19]  Shi Li,et al.  Constant approximation for k-median and k-means with outliers via iterative rounding , 2017, STOC.

[20]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[21]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[22]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[23]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[24]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[25]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[26]  Prateek Jain,et al.  Thresholding based Efficient Outlier Robust PCA , 2017, ArXiv.

[27]  Kannan Ramchandran,et al.  Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning , 2018, ICML.

[28]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[29]  Dimitris S. Papailiopoulos,et al.  Gradient Diversity: a Key Ingredient for Scalable Distributed Learning , 2017, AISTATS.

[30]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[31]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[32]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[33]  Anit Kumar Sahu,et al.  On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.

[34]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[35]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[36]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[37]  Qing Ling,et al.  RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets , 2018, AAAI.

[38]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[39]  Diane J. Cook,et al.  Keeping the Resident in the Loop: Adapting the Smart Home to the User , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[40]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[41]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[43]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2017, Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems.