Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition

Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by “some” as entailments. For some presupposition triggers like “only”, BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.

[1]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[2]  Christopher Potts Presupposition and Implicature , 2015 .

[3]  H. Savin,et al.  The projection problem for presuppositions , 1971 .

[4]  Yejin Choi,et al.  Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[5]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[6]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[7]  Jason Weston,et al.  Dialogue Natural Language Inference , 2018, ACL.

[8]  Ido Dagan,et al.  Addressing Discourse and Document Structure in the RTE Search Task , 2009, TAC.

[9]  Siobhan Chapman Logic and Conversation , 2005 .

[10]  Irene Heim,et al.  On the Projection Problem for Presuppositions , 2008 .

[11]  Shikha Bordia,et al.  Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[12]  Allyson Ettinger,et al.  Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[13]  Judith Tonhauser,et al.  The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .

[14]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[15]  Laurence R. Horn A Natural History of Negation , 1989 .

[16]  Robert Stalnaker,et al.  Presuppositions of Compound Sentences , 2008 .

[17]  Mohit Bansal,et al.  Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.

[18]  Marie-Catherine de Marneffe,et al.  Do You Know That Florence Is Packed with Visitors? Evaluating State-of-the-art Models of Speaker Commitment , 2019, ACL.

[19]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[20]  Laurence R. Horn,et al.  The handbook of pragmatics , 2004 .

[21]  J. Hintikka On denoting what? , 2005, Synthese.

[22]  Christopher D. Manning LOCAL TEXTUAL INFERENCE : IT'S HARD TO CIRCUMSCRIBE , BUT YOU KNOW IT WHEN YOU SEE IT - AND NLP NEEDS IT , 2006 .

[23]  Christopher Potts,et al.  Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences , 2018, ArXiv.

[24]  Hinrich Schütze,et al.  SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference , 2019, ACL.

[25]  Johan Bos,et al.  HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning , 2019, *SEMEVAL.

[26]  Yonatan Bisk,et al.  Natural Language Inference from Multiple Premises , 2017, IJCNLP.

[27]  Judith Degen,et al.  Investigating the distribution of some (but not all ) implicatures using corpora and web-based methods , 2015 .

[28]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[29]  Lauri Karttunen,et al.  Local Textual Inference: Can it be Defined or Circumscribed? , 2005, EMSEE@ACL.

[30]  Aaron Steven White,et al.  The role of veridicality and factivity in clause selection * , 2017 .

[31]  S. Levinson Presumptive Meanings: The theory of generalized conversational implicature , 2001 .

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Gerald Gazdar,et al.  Pragmatics: Implicature, Presupposition, and Logical Form , 1978 .

[34]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[35]  Rachel Rudinger,et al.  Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , 2018, BlackboxNLP@EMNLP.

[36]  Rachel Rudinger,et al.  Lexicosyntactic Inference in Neural Models , 2018, EMNLP.

[37]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[38]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[39]  Shalom Lappin,et al.  当代语义理论指南 = The Handbook of Contemporary Semantic Theory , 2015 .

[40]  J Quinonero Candela,et al.  Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment , 2006, Lecture Notes in Computer Science.

[41]  Francis Ferraro,et al.  Semantic Proto-Roles , 2015, TACL.

[42]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Alex Wang,et al.  Probing What Different NLP Tasks Teach Machines about Function Word Comprehension , 2019, *SEMEVAL.

[46]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[47]  Jackie Chi Kit Cheung,et al.  Let’s do it “again”: A First Computational Approach to Detecting Adverbial Presupposition Triggers , 2018, ACL.

[48]  Doug Downey,et al.  Abductive Commonsense Reasoning , 2019, ICLR.

[49]  M. Lyons Presupposition , 2021, Encyclopedia of Autism Spectrum Disorders.

[50]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[51]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[52]  Sheng Zhang,et al.  Ordinal Common-sense Inference , 2016, TACL.

[53]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[54]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[55]  Samuel R. Bowman,et al.  BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[56]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[57]  David Lewis,et al.  Scorekeeping in a language game , 1979, J. Philos. Log..

[58]  Johan Bos,et al.  Can Neural Networks Understand Monotonicity Reasoning? , 2019, BlackboxNLP@ACL.

[59]  Carolyn Penstein Rosé,et al.  Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[60]  Lawrence S. Moss,et al.  Probing Natural Language Inference Models through Semantic Fragments , 2020, AAAI.

[61]  Kevin Duh,et al.  Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.

[62]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.