Unsupervised Resource Creation for Textual Inference Applications

This paper explores how a battery of unsupervised techniques can be used in order to create large, high-quality corpora for textual inference applications, such as systems for recognizing textual entailment (TE) and textual contradiction (TC). We show that it is possible to automatically generate sets of positive and negative instances of textual entailment and contradiction from textual corpora with greater than 90% precision. We describe how we generated more than 1 million TE pairs - and a corresponding set of and 500,000 TC pairs - from the documents found in the 2 GB AQUAINT-2 newswire corpus.

[1]  Andrew Hickl,et al.  Question Answering with LCC's CHAUCER-2 at TREC 2007 , 2006, TREC.

[2]  Chris Brockett,et al.  Support Vector Machines for Paraphrase Identification and Corpus Construction , 2005, IJCNLP.

[3]  K. Markert,et al.  When logical inference helps determining textual entailment ( and when it doesn ’ t ) , .

[4]  Andrew Y. Ng,et al.  Robust Textual Inference via Graph Matching , 2005, HLT.

[5]  Ido Dagan,et al.  A Probabilistic Setting and Lexical Coocurrence Model for Textual Entailment , 2005, EMSEE@ACL.

[6]  John D. Burger,et al.  Generating an Entailment Corpus from News Headlines , 2005, EMSEE@ACL.

[7]  Arul Menezes,et al.  Syntactic Contributions in the Entailment Task: an implementation , 2005 .

[8]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[9]  Roy Bar-Haim,et al.  The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[10]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[11]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[12]  Rajat Raina,et al.  Robust Textual Inference Via Learning and Abductive Reasoning , 2005, AAAI.

[13]  Ben Taskar,et al.  Structured Prediction via the Extragradient Method , 2005, NIPS.

[14]  Andrew Hickl,et al.  Recognizing Textual Entailment with LCC’s G ROUNDHOG System , 2005 .

[15]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[16]  Dan I. Moldovan,et al.  COGEX at the Second Recognizing Textual Entailment Challenge , 2006 .

[17]  Sanda M. Harabagiu,et al.  Negation, Contrast and Contradiction in Text Processing , 2006, AAAI.

[18]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[19]  Christine A. Gunlogson True to Form: Rising and Falling Declaratives as Questions in English , 2003 .