Byzantine Fault-Tolerant Parallelized Stochastic Gradient Descent for Linear Regression

This paper addresses the problem of Byzantine fault-tolerance in parallelized stochastic gradient descent (SGD) method solving for a linear regression problem. We consider a synchronous system comprising of a master and multiple workers, where up to a (known) constant number of workers are Byzantine faulty. Byzantine faulty workers may send incorrect information to the master during an execution of the parallelized SGD method. To mitigate the detrimental impact of Byzantine faulty workers, we replace the averaging of gradients in the traditional parallelized SGD method by a provably more robust gradient aggregation rule. The crux of the proposed gradient aggregation rule is a gradient-filter, named comparative gradient clipping(CGC) filter. We show that the resultant parallelized SGD method obtains a good estimate of the regression parameter even in presence of bounded fraction of Byzantine faulty workers. The upper bound derived for the asymptotic estimation error only grows linearly with the fraction of Byzantine faulty workers.

[1]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[2]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[3]  L. Bottou Learning and Stochastic Approximations 3 Q ( z , w ) measures the economical cost ( in hard currency units ) of delivering , 2012 .

[4]  Nitin H. Vaidya,et al.  Byzantine Fault Tolerant Distributed Linear Regression , 2019, ArXiv.

[5]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[6]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[7]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[8]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[9]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[10]  Dimitris S. Papailiopoulos,et al.  DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[11]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[12]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[13]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[14]  Suhas N. Diggavi,et al.  Data Encoding for Byzantine-Resilient Distributed Gradient Descent , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Soummya Kar,et al.  Resilient Distributed Estimation Through Adversary Detection , 2017, IEEE Transactions on Signal Processing.

[17]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2017, Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems.