Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.

[1]  Matteo Negri,et al.  Mining Wikipedia for Large-scale Repositories of Context-Sensitive Entailment Rules , 2010, LREC.

[2]  Ido Dagan,et al.  Efficient Semantic Deduction and Approximate Matching over Compact Parse Forests , 2008, TAC.

[3]  Yi Zhang,et al.  Recognizing Textual Relatedness with Predicate-Argument Structures , 2009, EMNLP.

[4]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[5]  Alon Lavie,et al.  Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[6]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[9]  Dan I. Moldovan,et al.  Lexical Chains for Question Answering , 2002, COLING.

[10]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[11]  Arthur C. Graesser,et al.  Lexico-syntactic subsumption for textual entailment , 2007 .

[12]  Bernardo Magnini,et al.  Tree edit distance for textual entailment , 2007 .

[13]  No Value,et al.  Proceedings of RANLP 2005 , 2005 .

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Marcello Federico,et al.  Towards Cross-Lingual Textual Entailment , 2010, NAACL.

[16]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[17]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[18]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[19]  Georgiana Dinu,et al.  Inference Rules and their Application to Recognizing Textual Entailment , 2009, EACL.

[20]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[21]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[22]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[23]  Dan I. Moldovan,et al.  A Semantic Approach to Recognizing Textual Entailment , 2005, HLT.

[24]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[25]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[26]  Alessandro Moschitti,et al.  Syntactic/Semantic Structures for Textual Entailment Recognition , 2010, NAACL.

[27]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[28]  S. H I Q I Z H A O,et al.  Extracting paraphrase patterns from bilingual parallel corpora , 2009 .

[29]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[30]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[31]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches , 2009, Natural Language Engineering.

[32]  Matteo Negri,et al.  Creating a Bi-lingual Entailment Corpus through Translations with Mechanical Turk: $100 for a 10-day Rush , 2010, Mturk@HLT-NAACL.