Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization

A common pain point in differentially private machine learning is the significant runtime overhead incurred when executing Differentially Private Stochastic Gradient Descent (DPSGD), which may be as large as two orders of magnitude. We thoroughly demonstrate that by exploiting powerful language primitives, including vectorization, just-in-time compilation, and static graph optimization, one can dramatically reduce these overheads, in many cases nearly matching the best non-private running times. These gains are realized in two frameworks: JAX and TensorFlow. JAX provides rich support for these primitives as core features of the language through the XLA compiler. We also rebuild core parts of TensorFlow Privacy, integrating features from TensorFlow 2 as well as XLA compilation, granting significant memory and runtime improvements over the current release version. These approaches allow us to achieve up to 50x speedups in comparison to the best alternatives. Our code is available at this https URL.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[4]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[8]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[9]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[10]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[11]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[12]  Ian J. Goodfellow,et al.  Efficient Per-Example Gradient Computations , 2015, ArXiv.

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[16]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[17]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[18]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[19]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[20]  Matthew Johnson,et al.  Compiling machine learning programs via high-level tracing , 2018 .

[21]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[22]  Eric W. Tramel,et al.  Efficient Per-Example Gradient Computations in Convolutional Neural Networks , 2019, ArXiv.

[23]  Dawn Song,et al.  Towards Practical Differentially Private Convex Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[26]  Ashish Agarwal,et al.  Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond , 2019, ArXiv.

[27]  Jerry Li,et al.  Privately Learning High-Dimensional Distributions , 2018, COLT.

[28]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[29]  Jonathan Ullman,et al.  Private Mean Estimation of Heavy-Tailed Distributions , 2020, COLT.

[30]  Philipp Hennig,et al.  BackPACK: Packing more into backprop , 2019, International Conference on Learning Representations.

[31]  FAST DIFFERENTIALLY PRIVATE-SGD VIA JL PROJECTIONS , 2020 .

[32]  Dietrich Klakow,et al.  Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks , 2020, TDS.

[33]  Jonathan Ullman,et al.  CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[34]  Rachel Cummings,et al.  Differentially Private Normalizing Flows for Privacy-Preserving Density Estimation , 2021, AIES.

[35]  Badih Ghazi,et al.  Large-Scale Differentially Private BERT , 2021, EMNLP.

[36]  Úlfar Erlingsson,et al.  Tempered Sigmoid Activations for Deep Learning with Differential Privacy , 2020, AAAI.

[37]  Janardhan Kulkarni,et al.  Fast and Memory Efficient Differentially Private-SGD via JL Projections , 2021, NeurIPS.

[38]  Graham Cormode,et al.  Opacus: User-Friendly Differential Privacy Library in PyTorch , 2021, ArXiv.

[39]  Daniel Kifer,et al.  Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping , 2020, Proc. Priv. Enhancing Technol..