论文信息 - UnNatural Language Inference - 字舞流文

UnNatural Language Inference

Recent investigations into the inner-workings of state-of-the-art large-scale pre-trained Transformer-based Natural Language Understanding (NLU) models indicate that they appear to understand human-like syntax, at least to some extent. We provide novel evidence that complicates this claim: we find that state-of-the-art Natural Language Inference (NLI) models assign the same labels to permuted examples as they do to the original, i.e. they are invariant to random word-order permutations. This behavior notably differs from that of humans; we struggle to understand the meaning of ungrammatical sentences. To measure the severity of this issue, we propose a suite of metrics and investigate which properties of particular permutations lead models to be word order invariant. For example, in MNLI dataset we find almost all (98.7%) examples contain at least one permutation which elicits the gold label. Models are even able to assign gold labels to permutations that they originally failed to predict correctly. We provide a comprehensive empirical evaluation of this phenomenon, and further show that this issue exists in pre-Transformer RNN / ConvNet based encoders, as well as across multiple languages (English and Chinese). Our code and data are available at https://github.com/facebookresearch/unlu.

Joelle Pineau | Adina Williams | Koustuv Sinha | Prasanna Parthasarathi | Joelle Pineau | Adina Williams | Koustuv Sinha | Prasanna Parthasarathi

[1] J. M. Cattell. THE TIME IT TAKES TO SEE AND NAME OBJECTS , 1886 .

[2] H. Carr. Tractatus Logico-Philosophicus , 1923, Nature.

[3] M. Bunge. Sense and reference , 1974 .

[4] Eckart Scheerer,et al. Early German approaches to experimental reading research: The contributions of Wilhelm Wundt and Ernst Meumann , 1981 .

[5] Zellig S. Harris,et al. Distributional Structure , 1954 .

[6] László Dezsö,et al. Universal Grammar , 1981, Certainty in Action.

[7] Aravind K. Joshi,et al. Parsing Strategies with ‘Lexicalized’ Grammars: Application to Tree Adjoining Grammars , 1988, COLING.

[8] Anne Abeillé,et al. Lexical and Syntactic Rules in a Tree Adjoining Grammar , 1990, ACL.

[9] Gennaro Chierchia,et al. Meaning and Grammar: An Introduction to Semantics , 1990 .

[10] Ivan A. Sag,et al. Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[11] Whitney Tabor,et al. Syntactic innovation : a connectionist model , 1994 .

[12] Irene Heim,et al. Semantics in generative grammar , 1998 .

[13] J. Zwart. The Minimalist Program , 1998, Journal of Linguistics.

[14] J. Bresnan. Lexical-Functional Syntax , 2000 .

[15] H Toyota. Changes in the Constraints of Semantic and Syntactic Congruity on Memory across Three Age Groups , 2001, Perceptual and motor skills.

[16] Daniel G. Bobrow,et al. Entailment, intensionality and text understanding , 2003, HLT-NAACL 2003.

[17] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18] Lucy Vanderwende,et al. What Syntax Can Contribute in the Entailment Task , 2005, MLCW.

[19] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[20] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21] A. Baddeley,et al. Working memory and binding in sentence recall , 2009 .

[22] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23] Christopher D. Manning. Computational Linguistics and Deep Learning , 2015, Computational Linguistics.

[24] Han Zhao,et al. Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[25] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[26] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Zhen-Hua Ling,et al. Enhanced LSTM for Natural Language Inference , 2016, ACL.

[29] Jonathan Grainger,et al. The sentence superiority effect revisited , 2017, Cognition.

[30] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[31] SHALOM LAPPIN,et al. Using Deep Neural Networks to Learn Syntactic Agreement , 2017, LILT.

[32] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[33] Francis M. Tyers,et al. Can LSTM Learn to Capture Agreement? The Case of Basque , 2018, BlackboxNLP@EMNLP.

[34] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.

[35] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[36] Noah D. Goodman,et al. Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[37] Edouard Grave,et al. Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[38] Roger Levy,et al. What do RNN Language Models Learn about Filler–Gap Dependencies? , 2018, BlackboxNLP@EMNLP.

[39] Tal Linzen,et al. Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[40] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[41] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[42] Samuel R. Bowman,et al. Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[43] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[44] Masatoshi Tsuchiya,et al. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment , 2018, LREC.

[45] Tal Linzen,et al. A Neural Model of Adaptation in Reading , 2018, EMNLP.

[46] Rachel Rudinger,et al. Lexicosyntactic Inference in Neural Models , 2018, EMNLP.

[47] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[48] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[49] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[50] Carolyn Penstein Rosé,et al. EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference , 2019, CoNLL.

[51] Brenden M. Lake,et al. Mutual exclusivity as a challenge for neural networks , 2019, ArXiv.

[52] Yoav Goldberg,et al. Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages , 2019, NAACL.

[53] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.

[54] Tal Linzen,et al. Quantity doesn’t buy quality syntax with neural language models , 2019, EMNLP.

[55] Shikha Bordia,et al. Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[56] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[57] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[58] Tal Linzen,et al. Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models , 2019, CoNLL.

[59] Afra Alishahi,et al. Correlating Neural and Symbolic Representations of Language , 2019, ACL.

[60] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[61] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[62] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[63] Tudor Dumitras,et al. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking , 2018, ICML.

[64] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[65] Carolyn Penstein Rosé,et al. Exploring Numeracy in Word Embeddings , 2019, ACL.

[66] Jonathan Grainger,et al. Word position coding in reading is noisy , 2019, Psychonomic Bulletin & Review.

[67] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[68] Jonathan Grainger,et al. Parallel, cascaded, interactive processing of words during sentence reading , 2019, Cognition.

[69] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[70] Peng Qian,et al. Representation of Constituents in Neural Language Models: Coordination Phrase as a Case Study , 2019, EMNLP.

[71] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[72] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[73] Emily M. Bender,et al. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.

[74] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[75] Samuel R. Bowman,et al. Can neural networks acquire a structural bias from raw linguistic data? , 2020, CogSci.

[76] Robert D. Hawkins,et al. Investigating Representations of Verb Bias in Neural Language Models , 2020, EMNLP.

[77] Adina Williams,et al. Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition , 2020, ACL.

[78] Wenkai Zhang,et al. SA-NLI: A Supervised Attention based framework for Natural Language Inference , 2020, Neurocomputing.

[79] Nitish Gupta,et al. Overestimation of Syntactic Representation in Neural Language Models , 2020, ACL.

[80] Qun Liu,et al. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.

[81] Marco Baroni,et al. Syntactic Structure from Deep Learning , 2020, Annual Review of Linguistics.

[82] Roger P. Levy,et al. A Systematic Assessment of Syntactic Generalization in Neural Language Models , 2020, ACL.

[83] R. Thomas McCoy,et al. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2019, BLACKBOXNLP.

[84] Ke Xu,et al. BERT Loses Patience: Fast and Robust Inference with Early Exit , 2020, NeurIPS.

[85] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[86] Lawrence S. Moss,et al. OCNLI: Original Chinese Natural Language Inference , 2020, FINDINGS.

[87] Evelina Fedorenko,et al. Composition is the Core Driver of the Language-selective Network , 2020, Neurobiology of Language.

[88] Forrest Davis,et al. Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment , 2020, ACL.

[89] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[90] Rui P. Chaves,et al. Assessing the ability of Transformer-based Neural Models to represent structurally unbounded dependencies , 2020, SCIL.

[91] Allyson Ettinger,et al. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[92] Rui P. Chaves,et al. What Don’t RNN Language Models Learn About Filler-Gap Dependencies? , 2020, SCIL.

[93] Samuel R. Bowman,et al. BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[94] Roger Levy,et al. SyntaxGym: An Online Platform for Targeted Evaluation of Language Models , 2020, ACL.

[95] Brenden M. Lake,et al. Mutual exclusivity as a challenge for deep neural networks , 2019, NeurIPS.

[96] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[97] Koustuv Sinha,et al. Sometimes We Want Translationese , 2021, ArXiv.

[98] Douwe Kiela,et al. Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little , 2021, EMNLP.

[99] Long Mai,et al. Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks? , 2020, FINDINGS.

[100] Vivek Srikumar,et al. BERT & Family Eat Word Salad: Experiments with Text Understanding , 2021, AAAI.

[101] Hanlin Tang,et al. Syntactic Perturbations Reveal Representational Correlates of Hierarchical Phrase Structure in Pretrained Language Models , 2021, REPL4NLP.