论文信息 - Assessing Unintended Memorization in Neural Discriminative Sequence Models

Assessing Unintended Memorization in Neural Discriminative Sequence Models

Despite their success in a multitude of tasks, neural models trained on natural language have been shown to memorize the intricacies of their training data, posing a potential privacy threat. In this work, we propose a metric to quantify unintended memorization in neural discriminative sequence models. The proposed metric, named d-exposure (discriminative exposure), utilizes language ambiguity and classification confidence to elicit the model’s propensity to memorization. Through experimental work on a named entity recognition task, we show the validity of d-exposure to measure memorization. In addition, we show that d-exposure is not a measure of overfitting as it does not increase when the model overfits.

Dietrich Klakow | Thomas Kleinbauer | Mossad Helali

[1] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.

[3] Vitaly Shmatikov,et al. Machine Learning Models that Remember Too Much , 2017, CCS.

[4] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[5] Matthias Hagen,et al. TARGER: Neural Argument Mining at Your Fingertips , 2019, ACL.

[6] Wenqi Wei,et al. Demystifying Membership Inference Attacks in Machine Learning as a Service , 2019, IEEE Transactions on Services Computing.

[7] Hassan Takabi,et al. Privacy-preserving Machine Learning as a Service , 2018, Proc. Priv. Enhancing Technol..

[8] Úlfar Erlingsson,et al. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[9] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.