Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy

We consider the problem of Byzantine faulttolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some agents (f out of N ) are Byzantine faulty. Such agents need not follow a prescribed algorithm correctly, and may communicate arbitrary incorrect information to the coordinator. In the presence of Byzantine agents, a more reasonable goal for the non-faulty agents is to find a minimizer of the aggregate cost function of only the non-faulty agents. This particular goal is commonly referred as exact fault-tolerance. Recent work has shown that exact faulttolerance is achievable if only if the non-faulty agents satisfy the property of 2f -redundancy. Now, under this property, techniques are known to impart exact fault-tolerance to the distributed implementation of the classical stochastic gradient-descent (SGD) algorithm. However, we do not know of any such techniques for the federated local SGD algorithm a more commonly used method for federated machine learning. To address this issue, we propose a novel technique named comparative elimination (CE). We show that, under 2f -redundancy, the federated local SGD algorithm with CE can indeed obtain exact fault-tolerance in the deterministic setting when the non-faulty agents can accurately compute gradients of their local cost functions. In the general stochastic case, when agents can only compute unbiased noisy estimates of their local gradients, our algorithm achieves approximate fault-tolerance with approximation error proportional to the variance of stochastic gradients and the fraction of Byzantine agents.

[1]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[2]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[3]  Shreyas Sundaram,et al.  Byzantine-Resilient Distributed Optimization of Multi-Dimensional Functions , 2020, 2020 American Control Conference (ACC).

[4]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[5]  Martin Jaggi,et al.  Learning from History for Byzantine Robust Optimization , 2020, ICML.

[6]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[7]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[8]  G. Giannakis,et al.  Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks , 2019, IEEE Transactions on Signal Processing.

[9]  Nitin H. Vaidya,et al.  Fault-Tolerance in Distributed Optimization: The Case of Redundancy , 2020, PODC.

[10]  Chandrajit L. Bajaj,et al.  The algebraic degree of geometric optimization problems , 1988, Discret. Comput. Geom..

[11]  Indranil Gupta,et al.  Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation , 2019, UAI.

[12]  João Pedro Hespanha,et al.  Observability of linear systems under adversarial attacks , 2015, 2015 American Control Conference (ACC).

[13]  Lili Su,et al.  Finite-Time Guarantees for Byzantine-Resilient Distributed State Estimation With Noisy Measurements , 2018, IEEE Transactions on Automatic Control.

[14]  Minghong Fang,et al.  FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping , 2021, NDSS.

[15]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[16]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[17]  Byzantine-Resilient Multi-Agent Optimization , 2020 .

[18]  Nitin H. Vaidya,et al.  Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms , 2016, PODC.

[19]  Kenneth T. Co,et al.  Byzantine-Robust Federated Machine Learning through Adaptive Model Averaging , 2019, ArXiv.

[20]  B. Gharesifard,et al.  Distributed Optimization Under Adversarial Nodes , 2016, IEEE Transactions on Automatic Control.

[21]  A. Salman Avestimehr,et al.  Byzantine-Resilient Secure Federated Learning , 2020, IEEE Journal on Selected Areas in Communications.

[22]  Jinyuan Jia,et al.  Local Model Poisoning Attacks to Byzantine-Robust Federated Learning , 2019, USENIX Security Symposium.

[23]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[24]  Rachid Guerraoui,et al.  Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent , 2021, ICLR.

[25]  Paulo Tabuada,et al.  Secure State Estimation Against Sensor Attacks in the Presence of Noise , 2015, IEEE Transactions on Control of Network Systems.

[26]  Jy-yong Sohn,et al.  Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks , 2019, NeurIPS.

[27]  Qing Ling,et al.  RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets , 2018, AAAI.

[28]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[29]  Waheed Uz Zaman Bajwa,et al.  ByRDiE: Byzantine-Resilient Distributed Coordinate Descent for Decentralized Learning , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[30]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[31]  Bo Li,et al.  DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[32]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[33]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[34]  Nitin H. Vaidya,et al.  Byzantine Fault Tolerant Distributed Linear Regression , 2019, ArXiv.

[35]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[36]  Nitin H. Vaidya,et al.  Resilience in Collaborative Optimization: Redundant and Independent Cost Functions , 2020, ArXiv.