PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.

[1]  Ido Dagan,et al.  QASem Parsing: Text-to-text Modeling of QA-based Semantics , 2022, EMNLP.

[2]  Greg Durrett,et al.  Generating Literal and Implied Subquestions to Fact-check Complex Claims , 2022, EMNLP.

[3]  H. Jagadish,et al.  CompactIE: Compact Facts in Open Information Extraction , 2022, NAACL.

[4]  Donald Metzler,et al.  Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters , 2022, EMNLP.

[5]  Marc van Zee,et al.  Scaling Up Models and Data with t5x and seqio , 2022, J. Mach. Learn. Res..

[6]  Paul N. Bennett,et al.  SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization , 2021, TACL.

[7]  Dragomir R. Radev,et al.  DocNLI: A Large-scale Dataset for Document-level Natural Language Inference , 2021, FINDINGS.

[8]  Dan Roth,et al.  Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection , 2021, NAACL.

[9]  Regina Barzilay,et al.  Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence , 2021, NAACL.

[10]  Eunsol Choi,et al.  Decontextualization: Making Sentences Stand-Alone , 2021, Transactions of the Association for Computational Linguistics.

[11]  Rainer Gemulla,et al.  On Aligning OpenIE Extractions with Knowledge Bases: A Case Study , 2020, EVAL4NLP.

[12]  Tanya Goyal,et al.  Evaluating Factuality in Generation with Dependency-level Entailment , 2020, FINDINGS.

[13]  Mausam,et al.  IMoJIE: Iterative Memory-Based Joint Open Information Extraction , 2020, ACL.

[14]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[15]  Jiawei Han,et al.  Generating Representative Headlines for News Stories , 2020, WWW.

[16]  Ido Dagan,et al.  Controlled Crowdsourcing for High-Quality QA-SRL Annotation , 2019, ACL.

[17]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[18]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[19]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[20]  Ido Dagan,et al.  Supervised Open Information Extraction , 2018, NAACL.

[21]  Luke S. Zettlemoyer,et al.  Large-Scale QA-SRL Parsing , 2018, ACL.

[22]  Ming Zhou,et al.  Neural Open Information Extraction , 2018, ACL.

[23]  Samuel R. Bowman,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[24]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Luke S. Zettlemoyer,et al.  Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.

[27]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[28]  Ido Dagan,et al.  Recognizing Implied Predicate-Argument Relationships in Textual Inference , 2014, ACL.

[29]  Omer Levy,et al.  Recognizing Partial Textual Entailment , 2013, ACL.

[30]  Ian S. Dunn,et al.  Exploring the Limits , 2009 .

[31]  Christopher D. Manning,et al.  Finding Contradictions in Text , 2008, ACL.

[32]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[33]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[34]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[35]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[36]  Marek Rei,et al.  Logical Reasoning with Span Predictions: Span-level Logical Atoms for Interpretable and Robust NLI Models , 2022, ArXiv.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Oriol Vinyals,et al.  Order Matters: Sequence to sequence for sets , 2016, ICLR 2016.

[39]  Benjamin Van Durme,et al.  Semantic Role Labeling , 2010, Semantic Role Labeling.

[40]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[41]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[42]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[43]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .