Findings of the 2016 Conference on Machine Translation

This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.

[1]  José Guilherme Camargo de Souza,et al.  FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[2]  Khalil Sima'an,et al.  UvA-DARE ( Digital Academic Repository ) Latent Domain Translation Models in Mix-of-Domains Haystack , 2014 .

[3]  Hermann Ney,et al.  The RWTH Aachen University English-Romanian Machine Translation System for WMT 2016 , 2016, WMT.

[4]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[5]  Alexander P. Molchanov,et al.  PROMT Translation Systems for WMT 2016 Translation Tasks , 2016, WMT.

[6]  Jan Niehues,et al.  Combining Word Reordering Methods on different Linguistic Abstraction Levels for Statistical Machine Translation , 2013, SSST@NAACL-HLT.

[7]  Christian Federmann,et al.  Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output , 2012, Prague Bull. Math. Linguistics.

[8]  Lucia Specia,et al.  SHEF-LIUM-NN: Sentence level Quality Estimation with Neural Network Features , 2016, WMT.

[9]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[10]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[11]  Lucia Specia,et al.  Phrase-level Quality Estimation for Machine Translation , 2015 .

[12]  Fabienne Braune,et al.  The QT21/HimL Combined Machine Translation System , 2016, WMT.

[13]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[14]  Bill Byrne,et al.  The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16 , 2016, WMT.

[15]  Stefan Riezler,et al.  QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation , 2015, WMT@EMNLP.

[16]  Ondrej Dusek,et al.  New Language Pairs in TectoMT , 2015, WMT@EMNLP.

[17]  Lucia Specia,et al.  USHEF and USAAR-USHEF participation in the WMT15 QE shared task , 2015, WMT@EMNLP.

[18]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[19]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[20]  Ergun Biçici,et al.  ParFDA for Instance Selection for Statistical Machine Translation , 2016, WMT.

[21]  Timothy Baldwin,et al.  Continuous Measurement Scales in Human Evaluation of Machine Translation , 2013, LAW@ACL.

[22]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[23]  Khalil Sima'an,et al.  A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[24]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[25]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[26]  Elizabeth Salesky,et al.  The AFRL-MITLL WMT16 News-Translation Task Systems , 2016, WMT.

[27]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[28]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[29]  Ondrej Bojar,et al.  Using MT-ComparEval , 2016 .

[30]  Mikko Kurimo,et al.  Hybrid Morphological Segmentation for Phrase-Based Machine Translation , 2016, WMT.

[31]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[32]  Anna Kazantseva,et al.  NRC Russian-English Machine Translation System for WMT 2016 , 2016, WMT.

[33]  Rico Sennrich,et al.  Edinburgh’s Statistical Machine Translation Systems for WMT16 , 2016, WMT.

[34]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[35]  Antonio Toral,et al.  Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences , 2016, WMT.

[36]  Marta R. Costa-jussà,et al.  The TALP-UPC Spanish-English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System , 2016, WMT.

[37]  Timothy Baldwin,et al.  Is Machine Translation Getting Better over Time? , 2014, EACL.

[38]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[39]  Alexander Waibel,et al.  The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2016 , 2016, WMT.

[40]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[41]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[42]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[43]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[44]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[45]  Marco Turchi,et al.  The FBK Participation in the WMT15 Automatic Post-editing Shared Task , 2015 .

[46]  Lucia Specia,et al.  Cohere: A Toolkit for Local Coherence , 2016, LREC.

[47]  Anton Frolov,et al.  YSDA Participation in the WMT’16 Quality Estimation Shared Task , 2016, WMT.

[48]  Lucia Specia,et al.  Metrics for Evaluation of Word-level Machine Translation Quality Estimation , 2016, ACL.

[49]  Ergun Biçici,et al.  Referential Translation Machines for Predicting Translation Performance , 2016, WMT.

[50]  Ramón Fernández Astudillo,et al.  Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task , 2016, WMT.

[51]  Alexander M. Fraser,et al.  CUNI-LMU Submissions in WMT2016: Chimera Constrained and Beaten , 2016, WMT.

[52]  Preslav Nakov,et al.  Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction , 2016, WMT.

[53]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.

[54]  Josef van Genabith,et al.  USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing , 2016, WMT.

[55]  Josef van Genabith,et al.  Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation , 2015, EAMT.

[56]  Khalil Sima'an,et al.  ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task , 2016, WMT.

[57]  Lucia Specia,et al.  Sheffield Systems for the English-Romanian WMT Translation Task , 2016, WMT.

[58]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[59]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[60]  Marcin Junczys-Dowmunt,et al.  Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing , 2016, WMT.

[61]  Gorka Labaka,et al.  IXA Biomedical Translation System at WMT16 Biomedical Translation Task , 2016, WMT.

[62]  Anton Dvorkovich,et al.  Yandex School of Data Analysis approach to English-Turkish translation at WMT16 News Translation Task , 2016, WMT.

[63]  Rico Sennrich,et al.  The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT , 2016, WMT.

[64]  Mikel L. Forcada,et al.  UAlacant word-level and phrase-level machine translation quality estimation systems at WMT 2016 , 2016, WMT.

[65]  Emre Bektas,et al.  TÜBİTAK SMT System Submission for WMT2016 , 2016, WMT.

[66]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[67]  Josef van Genabith,et al.  Statistical Post-Editing for a Statistical MT System , 2011, MTSUMMIT.

[68]  Timothy Baldwin,et al.  Can machine translation systems be evaluated by the crowd alone , 2015, Natural Language Engineering.

[69]  M. Sasikumar,et al.  Translation Quality Estimation using Recurrent Neural Network , 2016, WMT.

[70]  Jörg Tiedemann,et al.  Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools , 2016, WMT.

[71]  Lucia Specia,et al.  Word embeddings and discourse information for Quality Estimation , 2016, WMT.

[72]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[73]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[74]  Wolfgang Menzel,et al.  Data Selection for IT Texts using Paragraph Vector , 2016, WMT.

[75]  David Marecek Merged bilingual trees based on Universal Dependencies in Machine Translation , 2016, WMT.

[76]  Lucia Specia,et al.  USFD's Phrase-level Quality Estimation Systems , 2016, WMT.

[77]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[78]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[79]  Andy Way,et al.  Referential translation machines for predicting semantic similarity , 2016, Lang. Resour. Evaluation.

[80]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[81]  David Mare Merged bilingual trees based on Universal Dependencies in Machine Translation , 2016 .

[82]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[83]  Jong-Hyeok Lee,et al.  Recurrent Neural Network based Translation Quality Estimation , 2016, WMT.

[84]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[85]  Mikel L. Forcada,et al.  UAlacant word-level machine translation quality estimation system at WMT 2015 , 2015, WMT@EMNLP.

[86]  Ondrej Dusek,et al.  CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered , 2016, TSD.

[87]  Marion Weller,et al.  Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT Automatic Post-Editing , 2015, ACL.

[88]  Matteo Negri,et al.  The FBK Participation in the WMT 2016 Automatic Post-editing Shared Task , 2015, WMT.

[89]  Josef van Genabith,et al.  JU-USAAR: A Domain Adaptive MT System , 2016, WMT.

[90]  Sara Stymne,et al.  The UU Submission to the Machine Translation Quality Estimation Task , 2016, WMT.

[91]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[92]  Lucia Specia,et al.  SimpleNets: Quality Estimation with Resource-Light Neural Networks , 2016, WMT.

[93]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[94]  Matt Post,et al.  Efficient Elicitation of Annotations for Human Evaluation of Machine Translation , 2014, WMT@ACL.

[95]  Rudolf Rosa,et al.  Dictionary-based Domain Adaptation of MT Systems without Retraining , 2016, WMT.

[96]  Kevin Duh,et al.  The JHU Machine Translation Systems for WMT 2018 , 2018, WMT.

[97]  Reut Tsarfaty,et al.  Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages , 2014 .

[98]  Josef van Genabith,et al.  USAAR-SAPE: An English–Spanish Statistical Automatic Post-Editing System , 2015, WMT@EMNLP.

[99]  Alexander M. Fraser,et al.  The Edinburgh/LMU Hierarchical Machine Translation System for WMT 2016 , 2016, WMT.

[100]  Eleftherios Avramidis,et al.  DFKI’s system for WMT16 IT-domain task, including analysis of systematic errors , 2016, WMT.

[101]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[102]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[103]  Yvette Graham,et al.  Improving Evaluation of Machine Translation Quality Estimation , 2015, ACL.

[104]  Yoshua Bengio,et al.  NYU-MILA Neural Machine Translation Systems for WMT'16 , 2016, WMT.

[105]  François Yvon,et al.  LIMSI’s Contribution to the WMT’16 Biomedical Translation Task , 2016, WMT.

[106]  Ondrej Bojar,et al.  Bilingual Embeddings and Word Alignments for Translation Quality Estimation , 2016, WMT.

[107]  José Gabriel Pereira Lopes,et al.  English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach , 2016, WMT.

[108]  Ondrej Bojar,et al.  Results of the WMT17 Metrics Shared Task , 2017, WMT.

[109]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[110]  Olivier Pietquin,et al.  MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[111]  Philipp Koehn,et al.  Findings of the WMT 2016 Bilingual Document Alignment Shared Task , 2016, WMT.

[112]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[113]  Benjamin Lecouteux,et al.  LIG System for Word Level QE task at WMT14 , 2014, WMT@ACL.

[114]  Richard Socher,et al.  MetaMind Neural Machine Translation System for WMT 2016 , 2016, WMT.

[115]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[116]  Lucia Specia,et al.  Document-level translation quality estimation: exploring discourse and pseudo-references , 2014, EAMT.

[117]  Ondrej Bojar,et al.  Results of the WMT15 Tuning Shared Task , 2015, WMT@EMNLP.

[118]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[119]  Ondrej Bojar,et al.  A Grain of Salt for the WMT Manual Evaluation , 2011, WMT@EMNLP.

[120]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[121]  Lucia Specia,et al.  MARMOT: A Toolkit for Translation Quality Estimation at the Word Level , 2016, LREC.

[122]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[123]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[124]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[125]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[126]  Véronique Hoste,et al.  UGENT-LT3 SCATE Submission for WMT16 Shared Task on Quality Estimation , 2016, WMT.

[127]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[128]  Eneko Agirre,et al.  SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task , 2016, WMT.

[129]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[130]  Krzysztof Marasek,et al.  PJAIT Systems for the WMT 2016 , 2016, WMT.

[131]  Alexandre Allauzen,et al.  LIMSI$@$WMT'16: Machine Translation of News , 2016, WMT.

[132]  Lucia Specia,et al.  SHEF-MIME: Word-level Quality Estimation Using Imitation Learning , 2016, WMT.