IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation

This paper describes the UPC submissions to the WMT14 Metrics Shared Task : UPC-IPA and UPC-STOUT. These metrics use a collection of evaluation measures integrated in ASIYA, a toolkit for machine translation evaluation. In addition to some standard metrics, the two submissions take advantage of novel metrics that consider linguistic structures, lexical relationships, and semantics to compare both source and reference translation against the candidate translation. The new metrics are available for several target languages other than English. In the the official WMT14 evaluation, UPC-IPA and UPC-STOUT scored above the average in 7 out of 9 language pairs at the system level and 8 out of 9 at the segment level.

[1]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[2]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[3]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[4]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[5]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[6]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[7]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[8]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[9]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[10]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[11]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[12]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[13]  Bruno Pouliquen,et al.  Automatic Identification of Document Translations in Large Multilingual Document Collections , 2006, ArXiv.

[14]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[15]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[16]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[17]  Lluís Màrquez i Villodre,et al.  Linguistic Features for Automatic Evaluation of Heterogenous MT Systems , 2007, WMT@ACL.

[18]  J. Giménez,et al.  Empirical machine translation and its evaluation , 2008, EAMT.

[19]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[20]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[21]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[22]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[23]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[24]  Lluís Màrquez i Villodre,et al.  Linguistic measures for automatic machine translation evaluation , 2010, Machine Translation.

[25]  Lluís Màrquez i Villodre,et al.  A Graphical Interface for MT Evaluation and Error Analysis , 2012, ACL.

[26]  Ondrej Bojar,et al.  Results of the WMT14 Metrics Shared Task , 2013 .

[27]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[28]  Laura Mascarell,et al.  tSEARCH: Flexible and Fast Search over Automatic Translations for Improved Quality/Error Analysis , 2013, ACL.

[29]  Alberto Barrón-Cedeño,et al.  A Comparison of Approaches for Measuring Cross-Lingual Similarity of Wikipedia Articles , 2014, ECIR.

[30]  Ondrej Bojar,et al.  Results of the WMT13 Metrics Shared Task , 2015, WMT@EMNLP.