Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on (α, f)-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of (α, f)-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.

[1]  Prateek Mittal,et al.  Privacy Risks of Securing Machine Learning Models against Adversarial Examples , 2019, CCS.

[2]  Alexandre Maurer,et al.  AKSEL: Fast Byzantine SGD , 2020, OPODIS.

[3]  Rachid Guerraoui,et al.  Differential Privacy and Byzantine Resilience in SGD: Do They Add Up? , 2021, PODC.

[4]  Pan Li,et al.  When Machine Learning Meets Blockchain: A Decentralized, Privacy-preserving and Secure Design , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[5]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[6]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[7]  Rafael Pinot,et al.  A unified view on differential privacy and robustness to adversarial examples , 2019, ArXiv.

[8]  Rachid Guerraoui,et al.  Differentially Private Stochastic Coordinate Descent , 2020, ArXiv.

[9]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[10]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[11]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[12]  Ananda Theertha Suresh,et al.  Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[13]  A. Salman Avestimehr,et al.  Byzantine-Resilient Secure Federated Learning , 2020, IEEE Journal on Selected Areas in Communications.

[14]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[15]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[16]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[17]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[19]  Rachid Guerraoui,et al.  Genuinely Distributed Byzantine Machine Learning , 2020, PODC.

[20]  Moran Baruch,et al.  A Little Is Enough: Circumventing Defenses For Distributed Learning , 2019, NeurIPS.

[21]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[22]  Nitin H. Vaidya,et al.  Privacy-Preserving Distributed Learning via Obfuscated Stochastic Gradients , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[23]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  Justin Hsu,et al.  Data Poisoning against Differentially-Private Learners: Attacks and Defenses , 2019, IJCAI.

[25]  Gilles Barthe,et al.  Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , 2018, NeurIPS.

[26]  Emiliano De Cristofaro,et al.  Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy , 2020, ArXiv.

[27]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[28]  Kartik Sreenivasan,et al.  Attack of the Tails: Yes, You Really Can Backdoor Federated Learning , 2020, NeurIPS.

[29]  Indranil Gupta,et al.  Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation , 2019, UAI.

[30]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[31]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[32]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[33]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[34]  Indranil Gupta,et al.  Phocas: dimensional Byzantine-resilient stochastic gradient descent , 2018, ArXiv.

[35]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[36]  Jian Liu,et al.  Privacy-Preserving Distributed Deep Learning via Homomorphic Re-Encryption , 2019, Electronics.

[37]  Yu-Xiang Wang,et al.  Subsampled Rényi Differential Privacy and Analytical Moments Accountant , 2018, AISTATS.

[38]  Liwei Song,et al.  Membership Inference Attacks Against Adversarially Robust Deep Learning Models , 2019, 2019 IEEE Security and Privacy Workshops (SPW).

[39]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.