Findings of the 2013 Workshop on Statistical Machine Translation

We present the results of the WMT13 shared tasks, which included a translation task, a task for run-time estimation of machine translation quality, and an unofficial metrics task. This year, 143 machine translation systems were submitted to the ten translation tasks from 23 institutions. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually, in our largest manual evaluation to date. The quality estimation task had four subtasks, with a total of 14 teams, submitting 55 entries.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[3]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[6]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[7]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[8]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[9]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[12]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[13]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[14]  Levent Özgür,et al.  Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[15]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[16]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[17]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[18]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[19]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[20]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[21]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[22]  Y. Seginer,et al.  Learning syntactic structure , 2007 .

[23]  Kemal Oflazer,et al.  BLEU+: a Tool for Fine-Grained BLEU Computation , 2008, LREC.

[24]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[25]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[26]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[27]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[28]  Ondrej Bojar,et al.  Evaluation of Machine Translation Metrics for Czech as the Target Language , 2009, Prague Bull. Math. Linguistics.

[29]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[30]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[31]  Ondrej Bojar,et al.  Tackling Sparse Data Issue in Machine Translation Evaluation , 2010, ACL.

[32]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[33]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[34]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[35]  Alexandre Allauzen,et al.  Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Lucia Specia,et al.  Exploiting Objective Annotations for Measuring Translation Post-editing Effort , 2011 .

[37]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[38]  Eleftherios Avramidis,et al.  Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features , 2011, WMT@EMNLP.

[39]  Alexander H. Waibel,et al.  The Karlsruhe Institute of Technology Translation Systems for the WMT 2013 , 2012, WMT@NAACL-HLT.

[40]  Philipp Koehn Simulating human judgment in machine translation evaluation campaigns , 2012, IWSLT.

[41]  Daniele Pighin,et al.  A graph-based strategy to streamline translation quality assessments , 2012, AMTA 2012.

[42]  Lluís Màrquez i Villodre,et al.  A Graphical Interface for MT Evaluation and Error Analysis , 2012, ACL.

[43]  Eleftherios Avramidis,et al.  Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs , 2012, COLING.

[44]  Radu Soricut,et al.  The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[45]  William J. Byrne,et al.  N-gram posterior probability confidence measures for statistical machine translation: an empirical study , 2012, Machine Translation.

[46]  Kamel Smaïli,et al.  LORIA System for the WMT15 Quality Estimation Shared Task , 2015, WMT@EMNLP.

[47]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[48]  Adrià de Gispert,et al.  The University of Cambridge Russian-English System at WMT13 , 2013, WMT@ACL.

[49]  Jörg Tiedemann,et al.  Tunable Distortion Limits and Corpus Cleaning for SMT , 2013, WMT@ACL.

[50]  Joachim Wagner,et al.  DCU-Symantec at the WMT 2013 Quality Estimation Shared Task , 2013, WMT@ACL.

[51]  Kamel Smaïli,et al.  LORIA System for the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[52]  Antonio Toral,et al.  The CNGL-DCU-Prompsit Translation Systems for WMT13 , 2013, WMT@ACL.

[53]  Alberto Barrón-Cedeño,et al.  The TALP-UPC Phrase-Based Translation Systems for WMT13: System Combination with Morphology Generation, Domain Adaptation and Corpus Filtering , 2013, WMT@ACL.

[54]  Rudolf Rosa,et al.  Chimera - Three Heads for English-to-Czech Translation , 2013, WMT@ACL.

[55]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[56]  Ergun Biçici,et al.  Feature Decay Algorithms for Fast Deployment of Accurate Statistical Machine Translation Systems , 2013, WMT@ACL.

[57]  José Guilherme Camargo de Souza,et al.  FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[58]  Quoc-Khanh Do,et al.  Limsi @ Wmt13 , 2013, WMT@ACL.

[59]  Nadir Durrani,et al.  Edinburgh’s Machine Translation Systems for European Language Pairs , 2013, WMT@ACL.

[60]  Alexey Borisov,et al.  Yandex School of Data Analysis Machine Translation Systems for WMT13 , 2013, WMT@ACL.

[61]  Coskun Mermer,et al.  TÜBİTAK-BİLGEM German-English Machine Translation Systems for W13 , 2013, WMT@ACL.

[62]  Philipp Koehn,et al.  Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.

[63]  Daniel Zeman,et al.  CUni Multilingual Matrix in the WMT 2013 Shared Task , 2013, WMT@ACL.

[64]  Christopher D. Manning,et al.  Feature-Rich Phrase-based Translation: Stanford University’s Submission to the WMT 2013 Translation Task , 2013, WMT@ACL.

[65]  Josef van Genabith,et al.  CNGL-CORE: Referential Translation Machines for Measuring Semantic Similarity , 2013, *SEM@NAACL-HLT.

[66]  Benjamin Lecouteux,et al.  LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT , 2013, WMT@ACL.

[67]  Alon Lavie,et al.  The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References , 2013, WMT@ACL.

[68]  Eleftherios Avramidis,et al.  Selecting Feature Sets for Comparative and Time-Oriented Quality Estimation of Machine Translation Output , 2013, WMT@ACL.

[69]  Alberto Barrón-Cedeño,et al.  The TALP-UPC Approach to System Selection: Asiya Features and Pairwise Classification Using Random Forests , 2013, WMT@ACL.

[70]  Matt Post,et al.  Joshua 5.0: Sparser, Better, Faster, Server , 2013, WMT@ACL.

[71]  Evgeny Matusov,et al.  Omnifluent English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[72]  François Yvon,et al.  LIMSI Submission for the WMT'13 Quality Estimation Task: an Experiment with N-Gram Posteriors , 2013, WMT@ACL.

[73]  Josef van Genabith,et al.  Shallow Semantically-Informed PBSMT and HPBSMT , 2013, WMT@ACL.

[74]  Fabrice Lefèvre,et al.  Factored Machine Translation Systems for Russian-English , 2013, WMT@ACL.

[75]  Nadir Durrani,et al.  QCRI-MES Submission at WMT13: Using Transliteration Mining to Improve Statistical Machine Translation , 2013, WMT@ACL.

[76]  Yiming Wang,et al.  A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task , 2013, WMT@ACL.

[77]  Matteo Negri,et al.  Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks , 2013, ACL.

[78]  Philipp Koehn,et al.  Edinburgh’s Syntax-Based Machine Translation Systems , 2013, WMT@ACL.

[79]  Lucia Specia,et al.  SHEF-Lite: When Less is More for Translation Quality Estimation , 2013, WMT@ACL.

[80]  Nadir Durrani,et al.  Munich-Edinburgh-Stuttgart Submissions at WMT13: Morphological and Syntactic Processing for SMT , 2013, WMT@ACL.

[81]  Giuseppe Attardi,et al.  Pre-Reordering for Machine Translation Using Transition-Based Walks on Dependency Parse Trees , 2013, WMT@ACL.

[82]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[83]  Ergun Biçici Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[84]  Lucia Specia,et al.  Ranking Machine Translation Systems via Post-editing , 2013, TSD.

[85]  Lucia Specia,et al.  Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[86]  Lluís Formiga Fanals,et al.  Real-life translation quality estimation for MT system selection , 2013 .

[87]  Jimmy J. Lin,et al.  Towards Efficient Large-Scale Feature-Rich Statistical Machine Translation , 2013, WMT@ACL.

[88]  Raphaël Rubino,et al.  An Approach Using Style Classification Features for Quality Estimation , 2013, WMT@ACL.

[89]  Nadir Durrani,et al.  Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13 , 2013, WMT@ACL.

[90]  Markus Freitag,et al.  Joint WMT 2013 Submission of the QUAERO Project , 2013, WMT@ACL.

[91]  Josef van Genabith,et al.  Predicting sentence translation quality using extrinsic and language independent features , 2013, Machine Translation.

[92]  Stephan Vogel,et al.  MT Quality Estimation: The CMU System for WMT'13 , 2013, WMT@ACL.

[93]  Ondrej Bojar,et al.  PhraseFix: Statistical Post-Editing of TectoMT , 2013, WMT@ACL.

[94]  Ondrej Bojar,et al.  Results of the WMT13 Metrics Shared Task , 2015, WMT@EMNLP.