论文信息 - A(DP)<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="xu-ieq1-3107796.gif"/></alternatives></inline-formula>SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent W

A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent W

As deep learning models are usually massive and complex, distributed learning is essential for increasing training efficiency. Moreover, in many real-world application scenarios like healthcare, distributed learning can also keep the data local and protect privacy. A popular distributed learning strategy is federated learning, where there is a central server storing the global model and a set of local computing nodes updating the model parameters with their corresponding data. The updated model parameters will be processed and transmitted to the central server, which leads to heavy communication costs. Recently, asynchronous decentralized distributed learning has been proposed and demonstrated to be a more efficient and practical strategy where there is no central server, so that each computing node only communicates with its neighbors. Although no raw data will be transmitted across different local nodes, there is still a risk of information leak during the communication process for malicious participants to make attacks. In this paper, we present a differentially private version of asynchronous decentralized parallel SGD (ADPSGD) framework, or A(DP)$^2$SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants. Specifically, R{e}nyi differential privacy is used to provide tighter privacy analysis for our composite Gaussian mechanisms while the convergence rate is consistent with the non-private version. Theoretical analysis shows A(DP)$^2$SGD also converges at the optimal $\mathcal{O}(1/\sqrt{T})$ rate as SGD. Empirically, A(DP)$^2$SGD achieves comparable model accuracy as the differentially private version of Synchronous SGD (SSGD) but runs much faster than SSGD in heterogeneous computing environments.

[1] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.

[2] Suyog Gupta,et al. Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[3] Rachid Guerraoui,et al. Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[4] Vitaly Shmatikov,et al. Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5] Feng Yan,et al. LEASGD: an Efficient and Privacy-Preserving Decentralized Algorithm for Distributed Learning , 2018, ArXiv.

[6] Brian Kingsbury,et al. A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition , 2019, INTERSPEECH.

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Ilya Mironov,et al. Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[9] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Tara Javidi,et al. Decentralized Bayesian Learning over Graphs , 2019, ArXiv.

[11] Rui Zhang,et al. A Hybrid Approach to Privacy-Preserving Federated Learning , 2018, Informatik Spektrum.

[12] Rachid Guerraoui,et al. Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning , 2017, ArXiv.

[13] Anand D. Sarwate,et al. A near-optimal algorithm for differentially-private principal components , 2012, J. Mach. Learn. Res..

[14] Yan Zhang,et al. Differentially Private Asynchronous Federated Learning for Mobile Edge Computing in Urban Informatics , 2020, IEEE Transactions on Industrial Informatics.

[15] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[16] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18] Yu-Xiang Wang,et al. Subsampled Rényi Differential Privacy and Analytical Moments Accountant , 2018, AISTATS.

[19] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[20] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Nassir Navab,et al. BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning , 2019, ArXiv.

[22] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.

[23] Ivan Beschastnikh,et al. Biscotti: A Ledger for Private and Secure Peer-to-Peer Machine Learning , 2018, ArXiv.

[24] H. Brendan McMahan,et al. Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[25] Xiaodong Cui,et al. Distributed Deep Learning Strategies for Automatic Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Ling Huang,et al. Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[27] Moni Naor,et al. Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[28] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[29] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Takuya Akiba,et al. ChainerMN: Scalable Distributed Deep Learning Framework , 2017, ArXiv.

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[33] Lingxiao Wang,et al. Efficient Privacy-Preserving Nonconvex Optimization , 2019, ArXiv.

[34] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.

[35] Brian Kingsbury,et al. Improving Efficiency in Large-Scale Decentralized Distributed Training , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36] Feng Yan,et al. Towards Decentralized Deep Learning with Differential Privacy , 2019, CLOUD.

[37] Tara Javidi,et al. Peer-to-peer Federated Learning on Graphs , 2019, ArXiv.

[38] Shusen Yang,et al. Asynchronous Federated Learning with Differential Privacy for Edge Intelligence , 2019, ArXiv.

[39] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.