论文信息 - Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives and show that our convergence rates match that of vanilla SGD in the Byzantine-free setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model.

Suhas Diggavi | Deepesh Data | Deepesh Data | S. Diggavi

[1] J. Tukey. A survey of sampling from contaminated distributions , 1960 .

[2] J. Tukey. Mathematics and the Picturing of Data , 1975 .

[3] Franco P. Preparata,et al. The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[4] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[5] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.

[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7] H. Robbins. A Stochastic Approximation Method , 1951 .

[8] Nikhil Srivastava,et al. Twice-ramanujan sparsifiers , 2008, STOC '09.

[9] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[10] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[11] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[12] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[13] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[14] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[15] Jakub Konecný,et al. Stochastic, Distributed and Federated Optimization for Machine Learning , 2017, ArXiv.

[16] Gregory Valiant,et al. Learning from untrusted data , 2016, STOC.

[17] Rachid Guerraoui,et al. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[18] Lili Su,et al. Distributed Statistical Machine Learning in Adversarial Settings , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[19] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.

[20] Dan Alistarh,et al. Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[21] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.

[22] Dimitris S. Papailiopoulos,et al. DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[23] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.

[24] Kannan Ramchandran,et al. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[25] Paulo Tabuada,et al. Will Distributed Computing Revolutionize Peace? The Emergence of Battlefield IoT , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[26] Gregory Valiant,et al. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[27] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[28] Daniel M. Kane,et al. Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[29] Suhas Diggavi,et al. Byzantine-Tolerant Distributed Coordinate Descent , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[30] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[31] Nitin H. Vaidya,et al. Byzantine Fault-Tolerant Parallelized Stochastic Gradient Descent for Linear Regression , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.

[33] Suhas Diggavi,et al. Data Encoding Methods for Byzantine-Resilient Distributed Optimization , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[34] Hongyi Wang,et al. DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation , 2019, NeurIPS.

[35] Kamyar Azizzadenesheli,et al. signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.

[36] Kannan Ramchandran,et al. Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning , 2018, ICML.

[37] Lili Su,et al. Securing Distributed Gradient Descent in High Dimensional Statistical Learning , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[38] K. Ramchandran,et al. Communication-Efficient and Byzantine-Robust Distributed Learning , 2019, 2020 Information Theory and Applications Workshop (ITA).

[39] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.

[40] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[41] Suhas Diggavi,et al. Data Encoding for Byzantine-Resilient Distributed Optimization , 2021, IEEE Transactions on Information Theory.