An Efficient DP-SGD Mechanism for Large Scale NLP Models

Recent advances in deep learning have drastically improved performance on many Natural Language Understanding (NLU) tasks. However, the data used to train NLU models may contain private information such as addresses or phone numbers, particularly when drawn from human subjects. It is desirable that underlying models do not expose private information contained in the training data. Differentially Private Stochastic Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models. However, DP-SGD can be prohibitively slow to train. In this work, we propose a more efficient DP-SGD for training using a GPU infrastructure and apply it to fine-tuning models based on LSTM and transformer architectures. We report faster training times, alongside accuracy, theoretical privacy guarantees and success of Membership inference attacks for our models and observe that fine-tuning with proposed variant of DP-SGD can yield competitive models without significant degradation in training time and improvement in privacy protection. We also make observations such as looser theoretical ǫ, δ can translate into significant practical privacy gains.

[1]  W. Bruce Croft,et al.  BERT with History Answer Embedding for Conversational Question Answering , 2019, SIGIR.

[2]  Ling Liu,et al.  Towards Demystifying Membership Inference Attacks , 2018, ArXiv.

[3]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[4]  Zhiwei Steven Wu,et al.  Understanding Gradient Clipping in Private SGD: A Geometric Perspective , 2020, NeurIPS.

[5]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[6]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[7]  Verena Rieser,et al.  Benchmarking Natural Language Understanding Services for building Conversational Agents , 2019, IWSDS.

[8]  H. Brendan McMahan,et al.  Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.

[9]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[10]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[11]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[12]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[13]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[14]  Dietrich Klakow,et al.  Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks , 2020, TDS.

[15]  Emiliano De Cristofaro,et al.  LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks , 2017, ArXiv.

[16]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[17]  Rahul Gupta,et al.  A Re-Ranker Scheme For Integrating Large Scale NLU Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Tribhuvanesh Orekondy,et al.  Differential Privacy Defenses and Sampling Attacks for Membership Inference , 2021, AISec@CCS.

[20]  Vitaly Shmatikov,et al.  Auditing Data Provenance in Text-Generation Models , 2018, KDD.

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[22]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[23]  Calton Pu,et al.  Differentially Private Model Publishing for Deep Learning , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[25]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[26]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[27]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[28]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[29]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[30]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[31]  Robert Laganière,et al.  Membership Inference Attack against Differentially Private Deep Learning Model , 2018, Trans. Data Priv..

[32]  David Evans,et al.  Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.

[33]  Jonathan Ullman,et al.  Auditing Differentially Private Machine Learning: How Private is Private SGD? , 2020, NeurIPS.

[34]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[35]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[36]  Wei Zhang,et al.  A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent W , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[38]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.