Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

Abstract Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time — often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called “per-example gradient clipping” whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differeniation (e.g., in Py-Torch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.

[1]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[2]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  W. Marsden I and J , 2012 .

[7]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Bo Li,et al.  Differentially Private Data Generative Models , 2018, ArXiv.

[10]  Ian J. Goodfellow,et al.  Efficient Per-Example Gradient Computations , 2015, ArXiv.

[11]  Emiliano De Cristofaro,et al.  Differentially Private Mixture of Generative Neural Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[12]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[13]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Casey S. Greene,et al.  Privacy-preserving generative deep neural networks support clinical data sharing , 2017 .

[16]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[17]  Matthew Reimherr,et al.  KNG: The K-Norm Gradient Mechanism , 2019, NeurIPS.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[20]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[21]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[22]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[23]  H. Robbins A Stochastic Approximation Method , 1951 .

[24]  Liwei Wang,et al.  Efficient Private ERM for Smooth Objectives , 2017, IJCAI.

[25]  Mihaela van der Schaar,et al.  PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees , 2018, ICLR.

[26]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[27]  Di Wang,et al.  Differentially Private Empirical Risk Minimization Revisited: Faster and More General , 2018, NIPS.

[28]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Zhiwei Steven Wu,et al.  Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing , 2017, bioRxiv.

[31]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[32]  Dawn Song,et al.  Towards Practical Differentially Private Convex Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[33]  Eric W. Tramel,et al.  Efficient Per-Example Gradient Computations in Convolutional Neural Networks , 2019, ArXiv.

[34]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[35]  Dejing Dou,et al.  Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction , 2016, AAAI.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[38]  Maria Vouis Our data , 2019, Accounting, Auditing & Accountability Journal.

[39]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[42]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[43]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[44]  Bhavani M. Thuraisingham,et al.  Privacy Preserving Synthetic Data Release Using Deep Learning , 2018, ECML/PKDD.

[45]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[46]  Daniel Kifer,et al.  Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget , 2018, KDD.

[47]  Daniel Kifer,et al.  Renyi Differentially Private ERM for Smooth Objectives , 2019, AISTATS.

[48]  H. Brendan McMahan,et al.  Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.

[49]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[50]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[51]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[52]  Calton Pu,et al.  Differentially Private Model Publishing for Deep Learning , 2019, 2019 IEEE Symposium on Security and Privacy (SP).