NanoBatch DPSGD: Exploring Differentially Private learning on ImageNet with low batch sizes on the IPU

Differentially private SGD (DPSGD) has recently shown promise in deep learning. However, compared to non-private SGD, the DPSGD algorithm places computational overheads that can undo the benefit of batching in GPUs. Microbatching is a standard method to alleviate this and is fully supported in the TensorFlow Privacy library (TFDP). However, this technique, while improving training times also reduces the quality of the gradients and degrades the classification accuracy. Recent works that for example use the JAX framework show promise in also alleviating this but still show degradation in throughput from non-private to private SGD on CNNs, and have not yet shown ImageNet implementations. In our work, we argue that low batch sizes using group normalization on ResNet-50 can yield high accuracy and privacy on Graphcore IPUs. This enables DPSGD training of ResNet-50 on ImageNet in just 6 hours (100 epochs) on an IPU-POD16 system.

[1]  Spyridon Bakas,et al.  Multi-Institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation , 2018, BrainLes@MICCAI.

[2]  Radhika Arava,et al.  An Efficient DP-SGD Mechanism for Large Scale NLP Models , 2021, ArXiv.

[3]  Zihao Zhang,et al.  Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units , 2021, ArXiv.

[4]  Edward H. Lee,et al.  Deep COVID DeteCT: an international experience on COVID-19 lung detection and prognosis using chest CT , 2021, npj Digital Medicine.

[5]  Zhiwei Steven Wu,et al.  Privacy-Preserving Distributed Deep Learning for Clinical Data , 2018, ArXiv.

[6]  Daniele Paolo Scarpazza,et al.  Dissecting the Graphcore IPU Architecture via Microbenchmarking , 2019, ArXiv.

[7]  Peter Bloem,et al.  Three Tools for Practical Differential Privacy , 2018, ArXiv.

[8]  Badih Ghazi,et al.  Large-Scale Differentially Private BERT , 2021, EMNLP.

[9]  Ehsan Adeli,et al.  Scalable Differential Privacy with Sparse Network Finetuning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[11]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  Gautam Kamath,et al.  Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization , 2020, NeurIPS.

[16]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[17]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[18]  Zach Eaton-Rosen,et al.  Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training , 2021, ArXiv.

[19]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[20]  Daniel O'Hanlon,et al.  Studying the Potential of Graphcore® IPUs for Applications in Particle Physics , 2020, Comput. Softw. Big Sci..

[21]  Ali Kashif Bashir,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2013, ICIRA 2013.

[22]  Carlo Luschi,et al.  Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[23]  Sourabh Kulkarni,et al.  Hardware-accelerated Simulation-based Inference of Stochastic Epidemiology Models for COVID-19 , 2020, ACM J. Emerg. Technol. Comput. Syst..

[24]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[25]  Daguang Xu,et al.  Privacy-preserving Federated Brain Tumour Segmentation , 2019, MLMI@MICCAI.

[26]  Stefan Leutenegger,et al.  Bundle Adjustment on a Graph Processor , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zach Eaton-Rosen,et al.  Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence , 2021, ArXiv.