论文信息 - Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition - 字舞流文

Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition

Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by “some” as entailments. For some presupposition triggers like “only”, BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.

Adina Williams | Suvrat Bhooshan | Alex Warstadt | Paloma Jeretic | Adina Williams | Alex Warstadt | Suvrat Bhooshan | Paloma Jeretic

[1] H. Savin,et al. The projection problem for presuppositions , 1971 .

[2] Gerald Gazdar,et al. Pragmatics: Implicature, Presupposition, and Logical Form , 1978 .

[3] David Lewis,et al. Scorekeeping in a language game , 1979, J. Philos. Log..

[4] Laurence R. Horn. A Natural History of Negation , 1989 .

[5] S. Levinson. Presumptive Meanings: The theory of generalized conversational implicature , 2001 .

[6] Shalom Lappin,et al. 当代语义理论指南 = The Handbook of Contemporary Semantic Theory , 2015 .

[7] Laurence R. Horn,et al. The handbook of pragmatics , 2004 .

[8] Lauri Karttunen,et al. Local Textual Inference: Can it be Defined or Circumscribed? , 2005, EMSEE@ACL.

[9] J. Hintikka. On denoting what? , 2005, Synthese.

[10] Siobhan Chapman. Logic and Conversation , 2005 .

[11] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[12] Christopher D. Manning. LOCAL TEXTUAL INFERENCE : IT'S HARD TO CIRCUMSCRIBE , BUT YOU KNOW IT WHEN YOU SEE IT - AND NLP NEEDS IT , 2006 .

[13] J Quinonero Candela,et al. Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment , 2006, Lecture Notes in Computer Science.

[14] Robert Stalnaker,et al. Presuppositions of Compound Sentences , 2008 .

[15] Irene Heim,et al. On the Projection Problem for Presuppositions , 2008 .

[16] Ido Dagan,et al. Addressing Discourse and Document Structure in the RTE Search Task , 2009, TAC.

[17] Ido Dagan,et al. Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[18] M. Marelli,et al. SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[19] 付伶俐. 打磨Using Language,倡导新理念 , 2014 .

[20] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21] Judith Degen,et al. Investigating the distribution of some (but not all ) implicatures using corpora and web-based methods , 2015 .

[22] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23] Christopher Potts. Presupposition and Implicature , 2015 .

[24] Francis Ferraro,et al. Semantic Proto-Roles , 2015, TACL.

[25] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[26] Allyson Ettinger,et al. Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[27] Rui Yan,et al. Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[28] Sheng Zhang,et al. Ordinal Common-sense Inference , 2016, TACL.

[29] Kevin Duh,et al. Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.

[30] Yonatan Bisk,et al. Natural Language Inference from Multiple Premises , 2017, IJCNLP.

[31] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[32] Aaron Steven White,et al. The role of veridicality and factivity in clause selection * , 2017 .

[33] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[34] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[35] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[36] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[37] Rachel Rudinger,et al. Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , 2018, BlackboxNLP@EMNLP.

[38] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[39] Jackie Chi Kit Cheung,et al. Let’s do it “again”: A First Computational Approach to Detecting Adverbial Presupposition Triggers , 2018, ACL.

[40] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[41] Yejin Choi,et al. Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[42] Thomas Lukasiewicz,et al. e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[43] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[44] Rachel Rudinger,et al. Lexicosyntactic Inference in Neural Models , 2018, EMNLP.

[45] Christopher Potts,et al. Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences , 2018, ArXiv.

[46] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[47] Alex Wang,et al. Probing What Different NLP Tasks Teach Machines about Function Word Comprehension , 2019, *SEMEVAL.

[48] Jason Weston,et al. Dialogue Natural Language Inference , 2018, ACL.

[49] Johan Bos,et al. HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning , 2019, *SEMEVAL.

[50] Judith Tonhauser,et al. The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .

[51] Shikha Bordia,et al. Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[52] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[53] Marie-Catherine de Marneffe,et al. Do You Know That Florence Is Packed with Visitors? Evaluating State-of-the-art Models of Speaker Commitment , 2019, ACL.

[54] Sebastian Schuster,et al. Harnessing the richness of the linguistic signal in predicting pragmatic inferences , 2019, ArXiv.

[55] Johan Bos,et al. Can Neural Networks Understand Monotonicity Reasoning? , 2019, BlackboxNLP@ACL.

[56] Mohit Bansal,et al. Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.

[57] Hinrich Schütze,et al. SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference , 2019, ACL.

[58] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[59] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[60] Ashish Sabharwal,et al. Probing Natural Language Inference Models through Semantic Fragments , 2019, AAAI.

[61] Doug Downey,et al. Abductive Commonsense Reasoning , 2019, ICLR.

[62] Samuel R. Bowman,et al. BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[63] M. Lyons. Presupposition , 2021, Encyclopedia of Autism Spectrum Disorders.