Further Meta-Evaluation of Machine Translation

This paper analyzes the translation quality of machine translation systems for 10 language pairs translating between Czech, English, French, German, Hungarian, and Spanish. We report the translation quality of over 30 diverse translation systems based on a large-scale manual evaluation involving hundreds of hours of effort. We use the human judgments of the systems to analyze automatic evaluation metrics for translation quality, and we report the strength of the correlation with human judgments at both the system-level and at the sentence-level. We validate our manual evaluation methodology by measuring intra- and inter-annotator agreement, and collecting timing information.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[3]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[6]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  S. Shieber,et al.  A learning approach to improving sentence-level MT evaluation , 2004, TMI.

[9]  Alon Lavie,et al.  Multi-engine machine translation guided by explicit word matching , 2005, EAMT.

[10]  Amit Dubey,et al.  What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing , 2005, ACL.

[11]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[12]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[13]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[14]  Eckhard Bick A Constraint Grammar-Based Parser for Spanish , 2006 .

[15]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[16]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[17]  Richard M. Schwartz,et al.  Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[18]  Rebecca Hwa,et al.  Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[19]  Hermann Ney,et al.  Word Error Rates: Decomposition over POS classes and Applications for Error Analysis , 2007, WMT@ACL.

[20]  Christopher J. Dyer,et al.  The “Noisier Channel”: Translation from Morphologically Complex Languages , 2007, WMT@ACL.

[21]  Rebecca Hwa,et al.  A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation , 2007, ACL.

[22]  Rebecca Hwa,et al.  The Role of Pseudo References in MT Evaluation , 2008, WMT@ACL.

[23]  Lluís Màrquez i Villodre,et al.  A Smorgasbord of Features for Automatic MT Evaluation , 2008, WMT@ACL.

[24]  José B. Mariño,et al.  The TALP-UPC Ngram-Based Statistical Machine Translation System for ACL-WMT 2008 , 2008, WMT@ACL.

[25]  Philipp Koehn,et al.  Can we Relearn an RBMT System? , 2008, WMT@ACL.

[26]  Alon Lavie,et al.  Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output , 2008, WMT@ACL.

[27]  Petr Pajas,et al.  TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer , 2008, WMT@ACL.

[28]  Alon Lavie,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Statistical Blockintransfer Blockinsystems Blockinfor Blockinmachine Blockintranslation , 2022 .

[29]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[30]  John Shawe-Taylor,et al.  Kernel Regression Framework for Machine Translation: UCL System Description for WMT 2008 Shared Translation Task , 2008, WMT@ACL.

[31]  Kevin Duh,et al.  The University of Washington Machine Translation System for ACL WMT 2008 , 2008, WMT@ACL.

[32]  Attila Novák,et al.  The MetaMorpho Translation System , 2008, WMT@ACL.

[33]  MaTrEx: The DCU MT System for WMT 2008 , 2008, WMT@ACL.

[34]  Marc Dymetman,et al.  Using Syntactic Coupling Features for Discriminating Phrase-Based Translations (WMT-08 Shared Translation Task) , 2008, WMT@ACL.

[35]  Holger Schwenk,et al.  First Steps towards a General Purpose French/English Statistical Machine Translation System , 2008, WMT@ACL.

[36]  Richard M. Schwartz,et al.  Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination , 2008, WMT@ACL.

[37]  Andreas Eisele,et al.  Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System , 2008, WMT@ACL.

[38]  Sara Stymne,et al.  Effects of Morphological Analysis in Translation between German and English , 2008, WMT@ACL.

[39]  Preslav Nakov,et al.  Improving English-Spanish Statistical Machine Translation: Experiments in Domain Adaptation, Sentence Paraphrasing, Tokenization, and Recasing , 2008, WMT@ACL.

[40]  Kevin Duh,et al.  Ranking vs. Regression in Machine Translation Evaluation , 2008, WMT@ACL.

[41]  Noah A. Smith,et al.  Rich Source-Side Context for Statistical Machine Translation , 2008, WMT@ACL.

[42]  William J. Byrne,et al.  European Language Translation with Weighted Finite State Transducers: The CUED MT System for the 2008 ACL Workshop on SMT , 2008, WMT@ACL.

[43]  Stephan Vogel,et al.  Improving Word Alignment with Language Model Based Confidence Scores , 2008, WMT@ACL.

[44]  Philipp Koehn,et al.  Towards better Machine Translation Quality for the German-English Language Pairs , 2008, WMT@ACL.

[45]  Ondrej Bojar,et al.  Phrase-Based and Deep Syntactic English-to-Czech Statistical Machine Translation , 2008, WMT@ACL.

[46]  Limsi’s Statistical Translation Systems for WMT‘08 , 2009, WMT@EACL.