Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

The use of automatic word alignment to capture sentence-level semantic relations is common to a number of cross-lingual NLP applications. Despite its proved usefulness, however, word alignment information is typically considered from a quantitative point of view (e.g. the number of alignments), disregarding qualitative aspects (the importance of aligned terms). In this paper we demonstrate that integrating qualitative information can bring significant performance improvements with negligible impact on system complexity. Focusing on the cross-lingual textual entailment task, we contribute with a novel method that: i) significantly outperforms the state of the art, and ii) is portable, with limited loss in performance, to language pairs where training data are not available.

[1]  Marcello Federico,et al.  Match without a Referee: Evaluating MT Adequacy without Reference Translations , 2012, WMT@NAACL-HLT.

[2]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  Bernardo Magnini,et al.  Exploiting Linguistic Indices and Syntactic Structures for Multilingual Question Answering: ITC-irst at CLEF 2005 , 2005, CLEF.

[5]  Kathleen R. McKeown,et al.  Lost and Found in Translation: Cross-Lingual Question Answering with Result Translation , 2012 .

[6]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[7]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Christof Monz,et al.  CoSyne: a framework for multilingual content synchronization of wikis , 2011, Int. Sym. Wikis.

[10]  Alexander F. Gelbukh,et al.  Soft Cardinality + ML: Learning Adaptive Similarity Functions for Cross-lingual Textual Entailment , 2012, SemEval@NAACL-HLT.

[11]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Marcello Federico,et al.  Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents , 2012, ACL.

[14]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[15]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[16]  Nicoletta Calzolari,et al.  Multilingual Summarization by Integrating Linguistic Resources in the MLIS-MUSI Project , 2002, LREC.

[17]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[18]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[19]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[20]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[21]  Matteo Negri,et al.  Semeval-2013 Task 8: Cross-lingual Textual Entailment for Content Synchronization , 2013, *SEMEVAL.

[22]  Marcello Federico,et al.  Towards Cross-Lingual Textual Entailment , 2010, NAACL.

[23]  Katharina Wäschle,et al.  HDU: Cross-lingual Textual Entailment with SMT Features , 2012, *SEMEVAL.