论文信息 - Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

Abstract Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time — often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called “per-example gradient clipping” whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differeniation (e.g., in Py-Torch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.

Daniel Kifer | Jaewoo Lee | Daniel Kifer | Jaewoo Lee

[1] Anand D. Sarwate,et al. Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[2] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[3] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[4] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[5] Vitaly Shmatikov,et al. Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6] W. Marsden. I and J , 2012 .

[7] Kaiming He,et al. Group Normalization , 2018, ECCV.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Bo Li,et al. Differentially Private Data Generative Models , 2018, ArXiv.

[10] Ian J. Goodfellow,et al. Efficient Per-Example Gradient Computations , 2015, ArXiv.

[11] Emiliano De Cristofaro,et al. Differentially Private Mixture of Generative Neural Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[12] Daniel Kifer,et al. Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[13] Fei Wang,et al. Differentially Private Generative Adversarial Network , 2018, ArXiv.

[14] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15] Casey S. Greene,et al. Privacy-preserving generative deep neural networks support clinical data sharing , 2017 .

[16] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[17] Matthew Reimherr,et al. KNG: The K-Norm Gradient Mechanism , 2019, NeurIPS.