Practical and Private (Deep) Learning without Sampling or Shuffling

We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in important practical scenarios, particularly federated learning (FL). We design and analyze a DP variant of Follow-TheRegularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification.

[1]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[2]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[4]  Úlfar Erlingsson,et al.  Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation , 2020, ArXiv.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[7]  Oliver Kosut,et al.  A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via f-Divergences , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[8]  Vitaly Shmatikov,et al.  Auditing Data Provenance in Text-Generation Models , 2018, KDD.

[9]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[10]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[11]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[12]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Raef Bassily,et al.  Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses , 2020, NeurIPS.

[14]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[15]  H. Brendan McMahan,et al.  Training Production Language Models without Memorizing User Data , 2020, ArXiv.

[16]  Yu-Xiang Wang,et al.  Poission Subsampled Rényi Differential Privacy , 2019, ICML.

[17]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[18]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[19]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Kunal Talwar,et al.  Private stochastic convex optimization: optimal rates in linear time , 2020, STOC.

[22]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[23]  Jonathan Ullman,et al.  Auditing Differentially Private Machine Learning: How Private is Private SGD? , 2020, NeurIPS.

[24]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[25]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[26]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[27]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[28]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[29]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[30]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[31]  Karan Singh,et al.  The Price of Differential Privacy for Online Learning , 2017, ICML.

[32]  Raef Bassily,et al.  Private Stochastic Convex Optimization with Optimal Rates , 2019, NeurIPS.

[33]  James Honaker Efficient Use of Differentially Private Binary Trees , 2015 .

[34]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[35]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[36]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[37]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[38]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[39]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[40]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Matthew J. Streeter,et al.  Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[43]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[44]  Kunal Talwar,et al.  Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).

[45]  Adam D. Smith,et al.  (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.

[46]  H. Robbins A Stochastic Approximation Method , 1951 .

[47]  Swaroop Ramaswamy,et al.  Understanding Unintended Memorization in Federated Learning , 2020, ArXiv.

[48]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[49]  Jeffrey F. Naughton,et al.  Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics , 2016, SIGMOD Conference.

[50]  Dan Boneh,et al.  Differentially Private Learning Needs Better Features (or Much More Data) , 2020, ICLR.

[51]  Dawn Song,et al.  Towards Practical Differentially Private Convex Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[52]  Jacob Abernethy,et al.  Online Learning via the Differential Privacy Lens , 2019, NeurIPS.

[53]  Borja Balle,et al.  Privacy Amplification via Random Check-Ins , 2020, NeurIPS.

[54]  Úlfar Erlingsson,et al.  Tempered Sigmoid Activations for Deep Learning with Differential Privacy , 2020, AAAI.

[55]  Shuang Song,et al.  Making the Shoe Fit: Architectures, Initializations, and Tuning for Learning with Privacy , 2019 .

[56]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[57]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[58]  Sashank J. Reddi,et al.  AdaCliP: Adaptive Clipping for Private SGD , 2019, ArXiv.

[59]  Vitaly Feldman,et al.  Privacy Amplification by Iteration , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[60]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[61]  Milad Nasr,et al.  Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning , 2021, 2021 IEEE Symposium on Security and Privacy (SP).

[62]  H. Brendan McMahan,et al.  Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.

[63]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[64]  Yu-Xiang Wang,et al.  Subsampled Rényi Differential Privacy and Analytical Moments Accountant , 2018, AISTATS.

[65]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[66]  Thomas Steinke,et al.  The Discrete Gaussian for Differential Privacy , 2020, NeurIPS.

[67]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[68]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[69]  Prateek Jain,et al.  (Near) Dimension Independent Risk Bounds for Differentially Private Learning , 2014, ICML.

[70]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[71]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.