Findings of the 2012 Workshop on Statistical Machine Translation

This paper presents the results of the WMT12 shared tasks, which included a translation task, a task for machine translation evaluation metrics, and a task for run-time estimation of machine translation quality. We conducted a large-scale manual evaluation of 103 machine translation systems submitted by 34 teams. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 12 evaluation metrics. We introduced a new quality estimation task this year, and evaluated submissions from 11 teams.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[4]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[5]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[6]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[7]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[8]  Yves Scherrer,et al.  Deep Linguistic Multilingual Translation and Bilingual Dictionaries , 2009, WMT@EACL.

[9]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[10]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[11]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[12]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[13]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[14]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[15]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[16]  Holger Schwenk,et al.  LIUM’s SMT Machine Translation Systems for WMT 2012 , 2011 .

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[19]  A. Waibel,et al.  The Karlsruhe Institute of Technology Translation Systems for the WMT 2012 , 2011, WMT@EMNLP.

[20]  Holger Schwenk,et al.  LIUM’s SMT Machine Translation Systems for WMT 2011 , 2012, WMT@NAACL-HLT.

[21]  Trevor Cohn,et al.  Regression and ranking based optimisation for sentence level machine translation evaluation , 2011 .

[22]  Lucia Specia,et al.  Exploiting Objective Annotations for Measuring Translation Post-editing Effort , 2011 .

[23]  Trevor Cohn,et al.  Regression and Ranking based Optimisation for Sentence Level MT Evaluation , 2011, WMT@EMNLP.

[24]  Ondrej Bojar,et al.  A Grain of Salt for the WMT Manual Evaluation , 2011, WMT@EMNLP.

[25]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[26]  Ondrej Bojar,et al.  Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning , 2011, WMT@EMNLP.

[27]  Christian Hardmeier Improving Machine Translation Quality Prediction with Syntactic Tree Kernels , 2011, EAMT.

[28]  Joachim Wagner,et al.  DCU-Symantec Submission for the WMT 2012 Quality Estimation Task , 2012, WMT@NAACL-HLT.

[29]  Adam Lopez,et al.  Putting Human Assessments of Machine Translation Systems in Order , 2012, WMT@NAACL-HLT.

[30]  Philipp Koehn,et al.  GHKM Rule Extraction and Scope-3 Parsing in Moses , 2012, WMT@NAACL-HLT.

[31]  Alexander H. Waibel,et al.  The Karlsruhe Institute of Technology Translation Systems for the WMT 2013 , 2012, WMT@NAACL-HLT.

[32]  David Vilar DFKI's SMT System for WMT 2012 , 2012, WMT@NAACL-HLT.

[33]  Paula Estrella,et al.  Semantic Textual Similarity for MT evaluation , 2012, WMT@NAACL-HLT.

[34]  Philipp Koehn,et al.  Towards Effective Use of Training Data in Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[35]  Ondrej Bojar,et al.  Selecting Data for English-to-Czech Machine Translation , 2012, WMT@NAACL-HLT.

[36]  Ondrej Bojar,et al.  Probes in a Taxonomy of Factored Phrase-Based Models , 2012, WMT@NAACL-HLT.

[37]  Anoop Sarkar,et al.  Kriya - The SFU System for Translation Task at WMT-12 , 2012, WMT@NAACL-HLT.

[38]  Alexandre Allauzen,et al.  Limsi @ Wmt12 , 2012, WMT@NAACL-HLT.

[39]  Ondrej Dusek,et al.  DEPFIX: A System for Automatic Correction of Czech MT Outputs , 2012, WMT@NAACL-HLT.

[40]  Markus Freitag,et al.  Joint WMT 2012 Submission of the QUAERO Project , 2012, WMT@NAACL-HLT.

[41]  Rubén San-Segundo-Hernández,et al.  UPM system for WMT 2012 , 2012, WMT@NAACL-HLT.

[42]  Alon Lavie,et al.  The CMU-Avenue French-English Translation System , 2012, WMT@NAACL-HLT.

[43]  Christopher D. Manning,et al.  SPEDE: Probabilistic Edit Distance Metrics for MT Evaluation , 2012, WMT@NAACL-HLT.

[44]  José B. Mariño,et al.  The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation , 2012, WMT@NAACL-HLT.

[45]  Carl Vogel,et al.  Quality Estimation: an experimental study using unsupervised similarity measures , 2012, WMT@NAACL-HLT.

[46]  Rico Sennrich,et al.  TerrorCat: a Translation Error Categorization-based MT Quality Metric , 2012, WMT@NAACL-HLT.

[47]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[48]  Jörg Tiedemann,et al.  Tree Kernels for Machine Translation Quality Estimation , 2012, WMT@NAACL-HLT.

[49]  Hai Zhao,et al.  Regression with Phrase Indicators for Estimating MT Quality , 2012, WMT@NAACL-HLT.

[50]  Radu Soricut,et al.  The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[51]  Eleftherios Avramidis,et al.  Quality estimation for Machine Translation output using linguistic analysis and decoding features , 2012, WMT@NAACL-HLT.

[52]  Ondrej Dusek,et al.  Formemes in English-Czech Deep Syntactic MT , 2012, WMT@NAACL-HLT.

[53]  Alexander P. Molchanov PROMT DeepHybrid system for WMT12 shared translation task , 2012, WMT@NAACL-HLT.

[54]  Christian Buck Black Box Features for the WMT 2012 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[55]  Francisco Casacuberta,et al.  PRHLT Submission to the WMT12 Quality Estimation Task , 2012, WMT@NAACL-HLT.

[56]  Lluís Màrquez i Villodre,et al.  The UPC Submission to the WMT 2012 Shared Task on Quality Estimation , 2012, WMT@NAACL-HLT.

[57]  Roland Kuhn,et al.  Improving AMBER, an MT Evaluation Metric , 2012, WMT@NAACL-HLT.

[58]  Kamel Smaïli,et al.  LORIA System for the WMT15 Quality Estimation Shared Task , 2015, WMT@EMNLP.

[59]  Ulrich Germann Syntax-aware Phrase-based Statistical Machine Translation: System Description , 2012, WMT@NAACL-HLT.

[60]  Matt Post,et al.  Joshua 4.0: Packing, PRO, and Paraphrases , 2012, WMT@NAACL-HLT.

[61]  Maja Popovic,et al.  Class error rates for evaluation of machine translation output , 2012, WMT@NAACL-HLT.

[62]  Preslav Nakov,et al.  QCRI at WMT12: Experiments in Spanish-English and German-English Machine Translation of News Text , 2012, WMT@NAACL-HLT.

[63]  Chen Yu,et al.  Machine Learning for Hybrid Machine Translation , 2012, WMT@NAACL-HLT.

[64]  Daniel Zeman Data Issues of the Multilingual Translation Matrix , 2012, WMT@NAACL-HLT.

[65]  Markus Freitag,et al.  The RWTH Aachen Machine Translation System for WMT 2012 , 2013, WMT@ACL.