Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning.

[1]  Florian Tramèr,et al.  Measuring Forgetting of Memorized Training Examples , 2022, ArXiv.

[2]  Florian Tramèr,et al.  The Privacy Onion Effect: Memorization is Relative , 2022, NeurIPS.

[3]  R. Zemel,et al.  Differentially Private Decoding in Large Language Models , 2022, ArXiv.

[4]  K. Chang,et al.  Are Large Pre-Trained Language Models Leaking Your Personal Information? , 2022, EMNLP.

[5]  Luke Zettlemoyer,et al.  Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models , 2022, Neural Information Processing Systems.

[6]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[7]  Minjoon Seo,et al.  TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models , 2022, EMNLP.

[8]  Ronak R. Mehta,et al.  Deep Unlearning via Randomized Conditionally Independent Hessians , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Florian Tramèr,et al.  Quantifying Memorization Across Neural Language Models , 2022, ICLR.

[10]  Colin Raffel,et al.  Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.

[11]  Florian Tramèr,et al.  What Does it Mean for a Language Model to Preserve Privacy? , 2022, FAccT.

[12]  Aaron C. Courville,et al.  Fortuitous Forgetting in Connectionist Networks , 2022, ICLR.

[13]  Huseyin A. Inan,et al.  Differentially Private Fine-tuning of Language Models , 2021, ICLR.

[14]  Tatsunori B. Hashimoto,et al.  Large Language Models Can Be Strong Differentially Private Learners , 2021, ICLR.

[15]  Stanley Jungkyu Choi,et al.  Towards Continual Knowledge Learning of Language Models , 2021, ICLR.

[16]  Badih Ghazi,et al.  Large-Scale Differentially Private BERT , 2021, EMNLP.

[17]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[18]  Nicholas Carlini,et al.  Deduplicating Training Data Makes Language Models Better , 2021, ACL.

[19]  Byron C. Wallace,et al.  Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? , 2021, NAACL.

[20]  Colin Raffel,et al.  Improving and Simplifying Pattern Exploiting Training , 2021, EMNLP.

[21]  Stella Biderman,et al.  GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[22]  Phil Blunsom,et al.  Mind the Gap: Assessing Temporal Generalization in Neural Language Models , 2021, NeurIPS.

[23]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[24]  Colin Raffel,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[25]  Vijay Ganesh,et al.  Amnesiac Machine Learning , 2020, AAAI.

[26]  David Lie,et al.  Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[27]  David Sánchez,et al.  Anonymisation Models for Text Data: State of the art, Challenges and Future Directions , 2021, ACL.

[28]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[29]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[30]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[31]  Stefano Soatto,et al.  Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[33]  Ronan Le Bras,et al.  WinoGrande , 2019, AAAI.

[34]  William W. Cohen,et al.  PubMedQA: A Dataset for Biomedical Research Question Answering , 2019, EMNLP.

[35]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[36]  James Zou,et al.  Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[37]  Yejin Choi,et al.  MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms , 2019, NAACL.

[38]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[39]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[40]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[41]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[42]  Peter Kieseberg,et al.  Humans forget, machines remember: Artificial intelligence and the Right to Be Forgotten , 2017, Comput. Law Secur. Rev..

[43]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[44]  Franck Dernoncourt,et al.  De-identification of patient notes with recurrent neural networks , 2016, J. Am. Medical Informatics Assoc..

[45]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[46]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[47]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[48]  Junfeng Yang,et al.  Towards Making Systems Forget with Machine Unlearning , 2015, 2015 IEEE Symposium on Security and Privacy.

[49]  Alessandro Mantelero,et al.  The EU Proposal for a General Data Protection Regulation and the roots of the 'right to be forgotten' , 2013, Comput. Law Secur. Rev..

[50]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[51]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[52]  Michael Roe,et al.  Scanning electronic documents for personally identifiable information , 2006, WPES '06.

[53]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.