Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

In Byzantine robust distributed optimization, a central server wants to train a machine learning model over data distributed across multiple workers. However, a fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages to the server. While this problem has received significant attention recently, most current defenses assume that the workers have identical data. For realistic cases when the data across workers is heterogeneous (non-iid), we design new attacks which circumvent these defenses leading to significant loss of performance. We then propose a simple resampling scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost. We theoretically and experimentally validate our approach, showing that combining resampling with existing robust algorithms is effective against challenging attacks.

[1]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[2]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[3]  John A. Rice,et al.  Mathematical statistics and data analysis , by John A. Rice. Pp 595.1988. ISBN 0-534-08247-5 (Wadsworth & Brooks/Cole) , 1988 .

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  S. Geer On Hoeffding's Inequality for Dependent Random Variables , 2002 .

[6]  J. Ramon,et al.  Hoeffding's inequality for sums of weakly dependent random variables , 2015, 1507.06871.

[7]  Tatsuya Harada,et al.  Implementation of a Practical Distributed Calculation System with Browsers and JavaScript, and Application to Distributed Deep Learning , 2015, ArXiv.

[8]  M. Welling,et al.  MLitB: Machine Learning in the Browser , 2014, PeerJ Comput. Sci..

[9]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[11]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[12]  M. Jirak Berry–Esseen theorems under weak dependence , 2016, 1606.01617.

[13]  ChenYudong,et al.  Distributed Statistical Machine Learning in Adversarial Settings , 2017 .

[14]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[15]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[16]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[17]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[18]  Dimitris S. Papailiopoulos,et al.  DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[19]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[20]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[21]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[22]  Indranil Gupta,et al.  Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance , 2018, ICML.

[23]  Waheed Uz Zaman Bajwa,et al.  ByRDiE: Byzantine-Resilient Distributed Coordinate Descent for Decentralized Learning , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[26]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[27]  Martin Jaggi,et al.  Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.

[28]  Kannan Ramchandran,et al.  Robust Federated Learning in a Heterogeneous Environment , 2019, ArXiv.

[29]  Qing Ling,et al.  RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets , 2018, AAAI.

[30]  Moran Baruch,et al.  A Little Is Enough: Circumventing Defenses For Distributed Learning , 2019, NeurIPS.

[31]  Hongyi Wang,et al.  DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation , 2019, NeurIPS.

[32]  Kamyar Azizzadenesheli,et al.  signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.

[33]  Waheed U. Bajwa,et al.  BRIDGE: Byzantine-Resilient Decentralized Gradient Descent , 2019, IEEE Transactions on Signal and Information Processing over Networks.

[34]  Kannan Ramchandran,et al.  Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning , 2018, ICML.

[35]  Rachid Guerraoui,et al.  AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation , 2019, SysML.

[36]  Lili Su,et al.  Securing Distributed Gradient Descent in High Dimensional Statistical Learning , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[37]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[38]  Qing Ling,et al.  Byzantine-Robust Decentralized Stochastic Optimization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Angelia Nedic,et al.  Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization , 2020, IEEE Signal Processing Magazine.

[40]  Dan Alistarh,et al.  Byzantine-Resilient Non-Convex Stochastic Gradient Descent , 2020, ICLR.

[41]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[42]  Ufuk Topcu,et al.  Robust Training in High Dimensions via Block Coordinate Geometric Median Descent , 2021, AISTATS.

[43]  Suhas Diggavi,et al.  Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data , 2020, ICML.

[44]  Suhas Diggavi,et al.  Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data , 2020, 2021 IEEE International Symposium on Information Theory (ISIT).

[45]  Zaïd Harchaoui,et al.  Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.