One size does not fit all: Investigating strategies for differentially-private learning across NLP tasks

Preserving privacy in contemporary NLP models allows us to work with sensitive data, but unfortunately comes at a price. We know that stricter privacy guarantees in differentially-private stochastic gradient descent (DP-SGD) generally degrade model performance. However, previous research on the efficiency of DP-SGD in NLP is inconclusive or even counter-intuitive. In this short paper, we provide an extensive analysis of different privacy preserving strategies on seven down-stream datasets in five different ‘typical’ NLP tasks with varying complexity using modern neural models based on BERT and XtremeDistil architectures. We show that unlike standard non-private approaches to solving NLP tasks, where bigger is usually better, privacy-preserving strategies do not exhibit a winning pattern, and each task and privacy regime re-quires a special treatment to achieve adequate performance.

[1]  Ivan Habernal How reparametrization trick broke differentially-private text representation learning , 2022, ACL.

[2]  Graham Cormode,et al.  Opacus: User-Friendly Differential Privacy Library in PyTorch , 2021, ArXiv.

[3]  Ivan Habernal,et al.  When differential privacy meets NLP: The devil is in the detail , 2021, EMNLP.

[4]  Abhik Jana,et al.  An Investigation towards Differentially Private Sequence Tagging in a Federated Framework , 2021, PRIVATENLP.

[5]  Colin Raffel,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[6]  Dylan Slack,et al.  Differentially Private Language Models Benefit from Public Pre-training , 2020, PRIVATENLP.

[7]  Fatemehsadat Mireshghallah,et al.  Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy , 2020, PPMLP@CCS.

[8]  Subhabrata Mukherjee,et al.  XtremeDistil: Multi-stage Distillation for Massive Multilingual Models , 2020, ACL.

[9]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[10]  Zhe Gan,et al.  Hierarchical Graph Network for Multi-hop Question Answering , 2019, EMNLP.

[11]  Avinash Madasu,et al.  Sequential Learning of Convolutional Features for Effective Text Classification , 2019, EMNLP.

[12]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[13]  Trevor Cohn,et al.  Massively Multilingual Transfer for NER , 2019, ACL.

[14]  M. Hudson Human , 2018, Critical Theory and the Classical World.

[15]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[16]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[17]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[18]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[19]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[21]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[22]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[23]  Yossi Matias,et al.  Learning and Evaluating a Differentially Private Pre-trained Language Model , 2021, PRIVATENLP.

[24]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[25]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.