Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data

The recent advances in sensor technologies and smart devices enable the collaborative collection of a sheer volume of data from multiple information sources. As a promising tool to efficiently extract useful information from such big data, machine learning has been pushed to the forefront and seen great success in a wide range of relevant areas such as computer vision, health care, and financial market analysis. To accommodate the large volume of data, there is a surge of interest in the design of distributed machine learning, among which stochastic gradient descent (SGD) is one of the mostly adopted methods. Nonetheless, distributed machine learning methods may be vulnerable to Byzantine attack, in which the adversary can deliberately share falsified information to disrupt the intended machine learning procedures. In this work, two asynchronous Byzantine tolerant SGD algorithms are proposed, in which the honest collaborative workers are assumed to store the model parameters derived from their own local data and use them as the ground truth. The proposed algorithms can deal with an arbitrary number of Byzantine attackers and are provably convergent. Simulation results based on a real-world dataset are presented to verify the theoretical results and demonstrate the effectiveness of the proposed algorithms.

[1]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[2]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[3]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[4]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[5]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[6]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[7]  Forrest N. Iandola,et al.  How to scale distributed deep learning? , 2016, ArXiv.

[8]  P. Lambin,et al.  Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept. , 2016, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[9]  Lifeng Lai,et al.  Distributed Gradient Descent Algorithm Robust to an Arbitrary Number of Byzantine Attackers , 2019, IEEE Transactions on Signal Processing.

[10]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[11]  Rachid Guerraoui,et al.  Asynchronous Byzantine Machine Learning , 2018, ICML 2018.

[12]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[13]  Ciprian Dobre,et al.  Intelligent services for Big Data science , 2014, Future Gener. Comput. Syst..

[14]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[15]  Rachid Guerraoui,et al.  Asynchronous Byzantine Machine Learning ( the case of SGD ) Supplementary Material , 2022 .