Umelb: Cross-lingual Textual Entailment with Word Alignment and String Similarity Features

This paper describes The University of Melbourne NLP group submission to the Crosslingual Textual Entailment shared task, our first tentative attempt at the task. The approach involves using parallel corpora and automatic word alignment to align text fragment pairs, and statistics based on unaligned words as features to classify items as forward and backward before a compositional combination into the final four classes, as well as experiments with additional string similarity features.

[1]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[2]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Bahar Salehi,et al.  Predicting the Compositionality of Multiword Expressions Using Translations in Multiple Languages , 2013, *SEMEVAL.

[5]  Marcello Federico,et al.  Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment , 2011, ACL.

[6]  Timothy Baldwin,et al.  PanLex and LEXTRACT: Translating all Words of all Languages of the World , 2010, COLING.

[7]  Timothy Baldwin,et al.  UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity , 2013, *SEM@NAACL-HLT.

[8]  Timothy Baldwin The hare and the tortoise: speed and accuracy in translation retrieval , 2009, Machine Translation.

[9]  José Guilherme Camargo de Souza,et al.  FBK: Cross-Lingual Textual Entailment Without Translation , 2012, SemEval@NAACL-HLT.

[10]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[11]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[14]  Marcello Federico,et al.  Towards Cross-Lingual Textual Entailment , 2010, NAACL.

[15]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[16]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.