An Efficient DP-SGD Mechanism for Large Scale NLU Models

Recent advances in deep learning have drastically improved performance on many Natural Language Understanding (NLU) tasks. However, the data used to train NLU models may contain private information such as addresses or phone numbers, particularly when drawn from human subjects. It is desirable that underlying models do not expose private information contained in the training data. Differentially Private Stochastic Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models. However, DP-SGD can be prohibitively slow to train. In this work, we propose a more efficient DP-SGD for training using a GPU infrastructure and apply it to fine-tuning models based on LSTM and transformer architectures. We report faster training times, alongside accuracy, theoretical privacy guarantees and success of Membership inference attacks for our models and observe that fine-tuning with proposed variant of DP-SGD can yield competitive models without significant degradation in training time and improvement in privacy protection. We also make observations such as looser theoretical ϵ, δ can translate into significant practical privacy gains.

[1]  Tribhuvanesh Orekondy,et al.  Differential Privacy Defenses and Sampling Attacks for Membership Inference , 2021, AISec@CCS.

[2]  Dietrich Klakow,et al.  Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks , 2020, TDS.

[3]  Wei Zhang,et al.  A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent W , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Zhiwei Steven Wu,et al.  Understanding Gradient Clipping in Private SGD: A Geometric Perspective , 2020, NeurIPS.

[5]  Jonathan Ullman,et al.  Auditing Differentially Private Machine Learning: How Private is Private SGD? , 2020, NeurIPS.

[6]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[7]  W. Bruce Croft,et al.  BERT with History Answer Embedding for Conversational Question Answering , 2019, SIGIR.

[8]  H. B. McMahan,et al.  Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.

[9]  Calton Pu,et al.  Differentially Private Model Publishing for Deep Learning , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[10]  P. Swietojanski,et al.  Benchmarking Natural Language Understanding Services for building Conversational Agents , 2019, IWSDS.

[11]  David Evans,et al.  Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.

[12]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[13]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[14]  Vitaly Shmatikov,et al.  Auditing Data Provenance in Text-Generation Models , 2018, KDD.

[15]  Rahul Gupta,et al.  A Re-Ranker Scheme For Integrating Large Scale NLU Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[16]  Ling Liu,et al.  Towards Demystifying Membership Inference Attacks , 2018, ArXiv.

[17]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[18]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[19]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[20]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[21]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[24]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[25]  Emiliano De Cristofaro,et al.  LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks , 2017, ArXiv.

[26]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[28]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[29]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[30]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[31]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[32]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[33]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[34]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Robert Laganière,et al.  Membership Inference Attack against Differentially Private Deep Learning Model , 2018, Trans. Data Priv..