Neural versus Phrase-Based Machine Translation Quality: a Case Study

Within the field of Statistical Machine Translation (SMT), the neural approach (NMT) has recently emerged as the first technology able to challenge the long-standing dominance of phrase-based approaches (PBMT). In particular, at the IWSLT 2015 evaluation campaign, NMT outperformed well established state-of-the-art PBMT systems on English-German, a language pair known to be particularly hard because of morphology and syntactic differences. To understand in what respects NMT provides better translation quality than PBMT, we perform a detailed analysis of neural versus phrase-based SMT outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data. For the first time, our analysis provides useful insights on what linguistic phenomena are best modeled by neural models -- such as the reordering of verbs -- while pointing out other aspects that remain to be improved.

[1]  Mary A. Flanagan,et al.  Error Classification for MT Evaluation , 1994, AMTA.

[2]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[3]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[4]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[5]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[6]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[7]  José A. R. Fonollosa,et al.  Linguistic-based Evaluation Criteria to identify Statistical Machine Translation Errors , 2010, EAMT.

[8]  Alexandra Birch,et al.  Reordering metrics for statistical machine translation , 2011 .

[9]  Maja Popovic Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[10]  Ondrej Bojar,et al.  Analyzing Error Types in English-Czech Machine Translation , 2011, Prague Bull. Math. Linguistics.

[11]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[12]  Ondrej Bojar,et al.  Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[13]  Dan I. Moldovan,et al.  Semantic Representation of Negation Using Focus Detection , 2011, ACL.

[14]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[15]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[16]  Maarit Koponen,et al.  Comparing human perceptions of post-editing effort with post-editing operations , 2012, WMT@NAACL-HLT.

[17]  Ondrej Bojar,et al.  Terra: a Collection of Translation Error-Annotated Corpora , 2012, LREC.

[18]  Sara Stymne,et al.  On the practice of error analysis for machine translation evaluation , 2012, LREC.

[19]  Dragos Stefan Munteanu,et al.  Measuring Machine Translation Errors in New Domains , 2013, TACL.

[20]  Arianna Bisazza,et al.  Efficient Solutions for Word Reordering in German-English Phrase-Based Statistical Machine Translation , 2013, WMT@ACL.

[21]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[22]  Gerold Schneider,et al.  Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis , 2013, RANLP.

[23]  Lucia Specia,et al.  A CCG-based Quality Estimation Metric for Statistical Machine Translation Learning from Human Judgments of Machine Translation Output , 2013, MTSUMMIT.

[24]  Hans Uszkoreit,et al.  Learning from human judgments of machine translation output , 2013 .

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[27]  Marcello Federico,et al.  Complexity of spoken versus written language for machine translation , 2014, EAMT.

[28]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[29]  Hans Uszkoreit,et al.  Using a new analytic measure for the annotation and analysis of MT errors on real data , 2014, EAMT.

[30]  Sonia Vandepitte,et al.  On the origin of errors: A fine-grained analysis of MT and PE errors and their relationship , 2014, LREC.

[31]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[32]  Marcello Federico,et al.  Assessing the Impact of Translation Errors on Machine Translation Quality with Mixed-effects Models , 2014, EMNLP.

[33]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[34]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[35]  Jan Niehues,et al.  The KIT translation systems for IWSLT 2015 , 2015, IWSLT.

[36]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[37]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[38]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[39]  Alexandra Birch,et al.  The Edinburgh Machine Translation Systems for IWSLT 2015 , 2015 .

[40]  Satoshi Nakamura,et al.  Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015 , 2015, WAT.

[41]  Jan Niehues,et al.  The IWSLT 2015 Evaluation Campaign , 2015, IWSLT.

[42]  Stefan Riezler,et al.  The Heidelberg University English-German translation system for IWSLT 2015 , 2015, IWSLT.

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.