论文信息 - Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? - 字舞流文

Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Data privacy is an important issue for “machine learning as a service” providers. We focus on the problem of membership inference attacks: Given a data sample and black-box access to a model’s API, determine whether the sample existed in the model’s training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.

Kevin Duh | Matt Post | Sorami Hisamoto | Matt Post | Kevin Duh | Sorami Hisamoto

[1] Reza Shokri,et al. Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[2] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[3] Matt Post,et al. We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Úlfar Erlingsson,et al. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[6] Marcin Junczys-Dowmunt,et al. Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora , 2018, WMT.

[7] Emiliano De Cristofaro,et al. Knock Knock, Who's There? Membership Inference on Aggregate Location Data , 2017, NDSS.

[8] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[9] Vitaly Shmatikov,et al. Auditing Data Provenance in Text-Generation Models , 2018, KDD.

[10] Kai Chen,et al. Understanding Membership Inferences on Well-Generalized Learning Models , 2018, ArXiv.

[11] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[12] Robert Laganière,et al. Membership Inference Attack against Differentially Private Deep Learning Model , 2018, Trans. Data Priv..

[13] Mario Fritz,et al. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[14] Ashwin Machanavajjhala,et al. Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges , 2016, Proc. VLDB Endow..

[15] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17] Amir Houmansadr,et al. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[18] Lucia Specia,et al. WMT17 Quality Estimation Shared Task Training and Development Data , 2016 .

[19] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[20] Cynthia Dwork,et al. Differential Privacy: A Survey of Results , 2008, TAMC.

[21] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22] Philipp Koehn,et al. Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[23] Somesh Jha,et al. The Unintended Consequences of Overfitting: Training Data Inference Attacks , 2017, ArXiv.

[24] Michael Veale,et al. Algorithms that remember: model inversion attacks and data protection law , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[26] Jong-Hyeok Lee,et al. Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[27] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[28] Ling Liu,et al. Towards Demystifying Membership Inference Attacks , 2018, ArXiv.

[29] André F. T. Martins,et al. OpenKiwi: An Open Source Framework for Quality Estimation , 2019, ACL.

[30] Huda Khayrallah,et al. On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[31] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[32] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[33] Emiliano De Cristofaro,et al. LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks , 2017, ArXiv.

[34] Philipp Koehn,et al. Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.