论文信息 - Adversarial Semantic Collisions - 字舞流文

Adversarial Semantic Collisions

We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions and demonstrate that state-of-the-art models for many tasks which rely on analyzing the meaning and similarity of texts-- including paraphrase identification, document retrieval, response suggestion, and extractive summarization-- are vulnerable to semantic collisions. For example, given a target query, inserting a crafted collision into an irrelevant document can shift its retrieval rank from 1000 to top 3. We show how to generate semantic collisions that evade perplexity-based filtering and discuss other potential mitigations. Our code is available at https://github.com/csong27/collision-bert.

Alexander Rush | Vitaly Shmatikov | Alexander M. Rush | Congzheng Song | Vitaly Shmatikov | Congzheng Song

[1] Carlos Guestrin,et al. Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[2] Tianhao Zhang,et al. Approximate Feature Collisions in Neural Nets , 2019, NeurIPS.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[5] Sameer Singh,et al. Generating Natural Adversarial Examples , 2017, ICLR.

[6] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[7] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[8] Yonatan Belinkov,et al. Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[9] Fernando Diaz,et al. UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[10] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[11] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[12] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.

[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15] Xirong Li,et al. Deep Text Classification Can be Fooled , 2017, IJCAI.

[16] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[17] Radha Poovendran,et al. Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[18] Shruti Tople,et al. To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers , 2020, ArXiv.

[19] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[20] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[21] Chris Callison-Burch,et al. Human and Automatic Detection of Generated Text , 2019, ArXiv.

[22] Jason Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[23] Dawn Song,et al. Imitation Attacks and Defenses for Black-box Machine Translation Systems , 2020, EMNLP.

[24] Peter Young,et al. Smart Reply: Automated Response Suggestion for Email , 2016, KDD.

[25] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[26] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.

[27] Luke S. Zettlemoyer,et al. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[28] Jimmy J. Lin,et al. Applying BERT to Document Retrieval with Birch , 2019, EMNLP.

[29] Jörn-Henrik Jacobsen,et al. Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness , 2019, ArXiv.

[30] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[31] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[32] Micha Elsner,et al. Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems , 2017 .

[33] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[34] Ankur P. Parikh,et al. Thieves on Sesame Street! Model Extraction of BERT-based APIs , 2019, ICLR.

[35] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.

[36] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.

[37] Graham Neubig,et al. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL.

[38] Matthias Bethge,et al. Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[39] Jason Weston,et al. Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[40] Jimmy J. Lin,et al. Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval , 2019, EMNLP.