Design and realization of a modular architecture for textual entailment

Abstract A key challenge at the core of many Natural Language Processing (NLP) tasks is the ability to determine which conclusions can be inferred from a given natural language text. This problem, called the Recognition of Textual Entailment (RTE), has initiated the development of a range of algorithms, methods, and technologies. Unfortunately, research on Textual Entailment (TE), like semantics research more generally, is fragmented into studies focussing on various aspects of semantics such as world knowledge, lexical and syntactic relations, or more specialized kinds of inference. This fragmentation has problematic practical consequences. Notably, interoperability among the existing RTE systems is poor, and reuse of resources and algorithms is mostly infeasible. This also makes systematic evaluations very difficult to carry out. Finally, textual entailment presents a wide array of approaches to potential end users with little guidance on which to pick. Our contribution to this situation is the novel EXCITEMENT architecture, which was developed to enable and encourage the consolidation of methods and resources in the textual entailment area. It decomposes RTE into components with strongly typed interfaces. We specify (a) a modular linguistic analysis pipeline and (b) a decomposition of the ‘core’ RTE methods into top-level algorithms and subcomponents. We identify four major subcomponent types, including knowledge bases and alignment methods. The architecture was developed with a focus on generality, supporting all major approaches to RTE and encouraging language independence. We illustrate the feasibility of the architecture by constructing mappings of major existing systems onto the architecture. The practical implementation of this architecture forms the EXCITEMENT open platform. It is a suite of textual entailment algorithms and components which contains the three systems named above, including linguistic-analysis pipelines for three languages (English, German, and Italian), and comprises a number of linguistic resources. By addressing the problems outlined above, the platform provides a comprehensive and flexible basis for research and experimentation in textual entailment and is available as open source software under the GNU General Public License.

[1]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[2]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[3]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[4]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[5]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[8]  M. de Rijke,et al.  Light-Weight Entailment Checking for Computational Semantics , 2001 .

[9]  Günter Neumann,et al.  An Integrated Archictecture for Shallow and Deep Processing , 2002, ACL.

[10]  James R. Curran,et al.  Blueprint for a High Performance NLP Infrastructure , 2003, HLT-NAACL 2003.

[11]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[12]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[13]  Andreas Eisele,et al.  The DeepThought Core Architecture Framework , 2004, LREC.

[14]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[15]  Adam Meyers,et al.  Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky , 2005 .

[16]  Roy Bar-Haim,et al.  Definition and Analysis of Intermediate Entailment Levels , 2005, EMSEE@ACL.

[17]  Dan I. Moldovan,et al.  A Semantic Approach to Recognizing Textual Entailment , 2005, HLT.

[18]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[19]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[20]  Johan Bos,et al.  Recognising Textual Entailment with Logical Inference , 2005, HLT.

[21]  Ido Dagan,et al.  Investigating a Generic Paraphrase-Based Approach for Relation Extraction , 2006, EACL.

[22]  Christopher D. Manning,et al.  Learning to recognize features of valid textual entailments , 2006, NAACL.

[23]  Sanda M. Harabagiu,et al.  Methods for Using Textual Entailment in Open-Domain Question Answering , 2006, ACL.

[24]  Ulrich Schäfer,et al.  Integrating deep and shallow natural language processing components: representations and hybrid architectures , 2006 .

[25]  C. Condoravdi,et al.  Computing relative polarity for textual inference , 2006 .

[26]  Max Mühlhäuser,et al.  Darmstadt Knowledge Processing Repository Based on UIMA , 2007 .

[27]  Ido Dagan,et al.  Semantic Inference at the Lexical-Syntactic Level , 2007, AAAI.

[28]  Daniel G. Bobrow,et al.  Precision-focused Textual Inference , 2007, ACL-PASCAL@ACL.

[29]  Christopher D. Manning,et al.  Natural Logic for Textual Inference , 2007, ACL-PASCAL@ACL.

[30]  Sanda M. Harabagiu,et al.  Satisfying information needs with multi-document summaries , 2007, Inf. Process. Manag..

[31]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[32]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[33]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[34]  Christiane Fellbaum,et al.  On the Role of Lexical and World Knowledge in RTE3 , 2007, ACL-PASCAL@ACL.

[35]  Günter Neumann,et al.  Information Synthesis for Answer Validation , 2008, CLEF.

[36]  Christopher D. Manning,et al.  Finding Contradictions in Text , 2008, ACL.

[37]  Günter Neumann,et al.  An Accuracy-Oriented Divide-and-Conquer Strategy for Recognizing Textual Entailment , 2008, TAC.

[38]  Emanuele Pianta,et al.  The TextPro Tool Suite , 2008, LREC.

[39]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[40]  M. Felisa Verdejo,et al.  Testing the Reasoning for Question Answering Validation , 2008, J. Log. Comput..

[41]  M. Pennacchiotti,et al.  A machine learning approach to textual entailment recognition , 2009, Natural Language Engineering.

[42]  Rodney D. Nielsen,et al.  Recognizing entailment in intelligent tutoring systems* , 2009, Natural Language Engineering.

[43]  Daniel Jurafsky,et al.  Measuring machine translation quality as semantic equivalence: A metric based on entailment features , 2009, Machine Translation.

[44]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[45]  Yi Zhang,et al.  Recognizing Textual Relatedness with Predicate-Argument Structures , 2009, EMNLP.

[46]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[47]  S. T E F A N H A R M E L I N G Inferring textual entailment with a probabilistically sound calculus ∗ , 2009 .

[48]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[49]  Richard Bergmair,et al.  A Proposal on Evaluation Measures for RTE , 2009, TextInfer@ACL.

[50]  Ido Dagan,et al.  Evaluating the Inferential Utility of Lexical-Semantic Resources , 2009, EACL.

[51]  Elena Cabrio,et al.  Towards Extensible Textual Entailment Engines: The EDITS Package , 2009, AI*IA.

[52]  Erhard W. Hinrichs,et al.  WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure , 2010, LREC.

[53]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.

[54]  Marcello Federico,et al.  Towards Cross-Lingual Textual Entailment , 2010, NAACL.

[55]  Ido Dagan,et al.  Assessing the Role of Discourse References in Entailment Inference , 2010, ACL.

[56]  Julio Castillo A Machine Learning Approach for Recognizing Textual Entailment in Spanish , 2010, NAACL.

[57]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..

[58]  Ido Dagan,et al.  Generating Entailment Rules from FrameNet , 2010, ACL.

[59]  Marcello Federico,et al.  Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment , 2011, ACL.

[60]  Ido Dagan,et al.  A Confidence Model for Syntactically-Motivated Entailment Proofs , 2011, RANLP.

[61]  Ido Dagan,et al.  Learning Entailment Relations by Global Graph Structure Optimization , 2012, CL.

[62]  Rui Wang,et al.  Intrinsic and extrinsic approaches to recognizing textual entailment , 2011 .

[63]  Elena Cabrio,et al.  Towards Component-Based Textual Entailment , 2011, IWCS.

[64]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.

[65]  Danilo Giampiccolo,et al.  Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization , 2012, *SEMEVAL.

[66]  Dan Roth,et al.  An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) , 2012, LREC.

[67]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[68]  Graeme Hirst,et al.  Recognizing Textual Entailment , 2012 .

[69]  Sebastian Padó,et al.  A Search Task Dataset for German Textual Entailment , 2013, IWCS.

[70]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[71]  Ido Dagan,et al.  TruthTeller: Annotating Predicate Truth , 2013, NAACL.

[72]  Mark Sammons,et al.  Recognizing Textual Entailment , 2015 .

[73]  K. Markert,et al.  When logical inference helps determining textual entailment ( and when it doesn ’ t ) , .