Decentralized Differentially Private Without-Replacement Stochastic Gradient Descent

While machine learning has achieved remarkable results in a wide variety of domains, the training of models often requires large datasets that may need to be collected from different individuals. As sensitive information may be contained in the individual's dataset, sharing training data may lead to severe privacy concerns. Therefore, there is a compelling need to develop privacy-aware machine learning methods, for which one effective approach is to leverage the generic framework of differential privacy. Considering that stochastic gradient descent (SGD) is one of the mostly adopted methods for large-scale machine learning problems, two decentralized differentially private SGD algorithms are proposed in this work. Particularly, we focus on SGD without replacement due to its favorable structure for practical implementation. In addition, both privacy and convergence analysis are provided for the proposed algorithms. Finally, extensive experiments are performed to verify the theoretical results and demonstrate the effectiveness of the proposed algorithms.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[6]  Arun Rajkumar,et al.  A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification , 2012, AISTATS.

[7]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[8]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[9]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[10]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[11]  Vitaly Feldman,et al.  Privacy Amplification by Iteration , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[12]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[13]  István Hegedüs,et al.  Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[14]  Alexander J. Smola,et al.  Efficient mini-batch training for stochastic optimization , 2014, KDD.

[15]  Tie-Yan Liu,et al.  Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.

[16]  Dawn Song,et al.  Towards Practical Differentially Private Convex Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[17]  Brendan J. Meade,et al.  Enabling large‐scale viscoelastic calculations via neural network acceleration , 2017, 1701.08884.

[18]  Ran El-Yaniv,et al.  Transductive Rademacher Complexity and Its Applications , 2007, COLT.

[19]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[20]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[21]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[22]  Liwei Wang,et al.  Efficient Private ERM for Smooth Objectives , 2017, IJCAI.

[23]  Di Wang,et al.  Differentially Private Empirical Risk Minimization Revisited: Faster and More General , 2018, NIPS.

[24]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[25]  Ohad Shamir,et al.  Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization , 2016, ArXiv.

[26]  Sebastian Nowozin,et al.  Learning Step Size Controllers for Robust Neural Network Training , 2016, AAAI.

[27]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[28]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[29]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[30]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[31]  István Hegedüs,et al.  Robust Decentralized Differentially Private Stochastic Gradient Descent , 2016, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[32]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[33]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[34]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[35]  Seth Neel,et al.  Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM , 2017, NIPS.

[36]  Kyung-Sup Kwak,et al.  The Internet of Things for Health Care: A Comprehensive Survey , 2015, IEEE Access.

[37]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[38]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[39]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[40]  Mark W. Schmidt,et al.  A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.

[41]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[42]  Jeffrey F. Naughton,et al.  Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics , 2016, SIGMOD Conference.

[43]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[44]  Ian R. Fasel,et al.  Optimization on a Budget: A Reinforcement Learning Approach , 2008, NIPS.

[45]  Lingxiao Wang,et al.  Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization , 2018, NeurIPS.

[46]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[47]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[48]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[49]  Mohamed Bakhouya,et al.  On the use of IoT and Big Data Technologies for Real-time Monitoring and Data Processing , 2017, EUSPN/ICTH.