FABA: An Algorithm for Fast Aggregation against Byzantine Attacks in Distributed Neural Networks

Many times, training a large scale deep learning neural network on a single machine becomes more and more difficult for a complex network model. Distributed training provides an efficient solution, but Byzantine attacks may occur on participating workers. They may be compromised or suffer from hardware failures. If they upload poisonous gradients, the training will become unstable or even converge to a saddle point. In this paper, we propose FABA, a Fast Aggregation algorithm against Byzantine Attacks, which removes the outliers in the uploaded gradients and obtains gradients that are close to the true gradients. We show the convergence of our algorithm. The experiments demonstrate that our algorithm can achieve similar performance to non-Byzantine case and higher efficiency as compared to previous algorithms.

[1]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[2]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Boris Grot,et al.  International Symposium on Computer Architecture (ISCA) , 2017, ISCA 2017.

[4]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[5]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[8]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Rachid Guerraoui,et al.  Asynchronous Byzantine Machine Learning ( the case of SGD ) Supplementary Material , 2022 .

[11]  Sam Houston Department of Computer Science , 2010 .

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Michel Métivier,et al.  Semimartingales: A course on stochastic processes , 1986 .

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[17]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.