论文信息 - Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French - 字舞流文

Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French

Abstract Within the field of statistical machine translation, the neural approach (NMT) is currently pushing ahead the state of the art performance traditionally achieved by phrase-based approaches (PBMT), and is rapidly becoming the dominant technology in machine translation. Indeed, in the last IWSLT and WMT evaluation campaigns on machine translation, NMT outperformed well established state-of-the-art PBMT systems on many different language pairs. To understand in what respects NMT provides better translation quality than PBMT, we perform a detailed analysis of neural versus phrase-based statistical machine translation outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data. In this analysis, we focus on two language directions with different characteristics: English–German, known to be particularly hard because of morphology and syntactic differences, and English–French, where PBMT systems typically reach outstanding quality and thus represent a strong competitor for NMT. Our analysis provides useful insights on what linguistic phenomena are best modelled by neural models – such as the reordering of verbs and nouns – while pointing out other aspects that remain to be improved – like the correct translation of proper nouns.

Arianna Bisazza | Mauro Cettolo | Marcello Federico | Luisa Bentivogli | Marcello Federico | L. Bentivogli | M. Cettolo | Arianna Bisazza

[1] Antonio Toral,et al. Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation , 2017, Prague Bull. Math. Linguistics.

[2] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[3] Yoshua Bengio,et al. Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[4] Yoshua Bengio,et al. Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[5] Scott M. Smith,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[6] Satoshi Nakamura,et al. Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015 , 2015, WAT.

[7] Alexandra Birch,et al. The Edinburgh Machine Translation Systems for IWSLT 2015 , 2015 .

[8] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9] Rico Sennrich,et al. How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[10] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[11] Michael J. Burke,et al. Averaging Correlations: Expected Values and Bias in Combined Pearson rs and Fisher's z Transformations , 1998 .

[12] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[13] Ondrej Bojar,et al. Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[14] Ondrej Bojar,et al. Analyzing Error Types in English-Czech Machine Translation , 2011, Prague Bull. Math. Linguistics.

[15] Maja Popovic. Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[16] José A. R. Fonollosa,et al. Linguistic-based Evaluation Criteria to identify Statistical Machine Translation Errors , 2010, EAMT.

[17] Stefan Riezler,et al. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[18] Marcin Junczys-Dowmunt,et al. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[19] Marcin Junczys-Dowmunt,et al. The University of Edinburgh’s systems submission to the MT task at IWSLT , 2018, IWSLT.

[20] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22] Hermann Ney,et al. Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[23] Philipp Koehn,et al. Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[24] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[25] Dragos Stefan Munteanu,et al. Measuring Machine Translation Errors in New Domains , 2013, TACL.

[26] Ondrej Bojar,et al. Terra: a Collection of Translation Error-Annotated Corpora , 2012, LREC.

[27] Hans Uszkoreit,et al. Learning from human judgments of machine translation output , 2013 .

[28] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29] Maarit Koponen,et al. Comparing human perceptions of post-editing effort with post-editing operations , 2012, WMT@NAACL-HLT.

[30] Mary A. Flanagan,et al. Error Classification for MT Evaluation , 1994, AMTA.

[31] Arianna Bisazza,et al. Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[32] Hermann Ney,et al. Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[33] Jan Niehues,et al. The KIT translation systems for IWSLT 2015 , 2015, IWSLT.

[34] Mauro Cettolo,et al. The IWSLT 2016 Evaluation Campaign , 2016, IWSLT.

[35] Krzysztof Marasek,et al. PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora , 2015, IWSLT.

[36] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[37] R. Tibshirani,et al. An introduction to the bootstrap , 1993 .

[38] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[39] Hans Uszkoreit,et al. Using a new analytic measure for the annotation and analysis of MT errors on real data , 2014, EAMT.

[40] Deyi Xiong,et al. Automatic Long Sentence Segmentation for Neural Machine Translation , 2016, NLPCC/ICCPOL.

[41] Jan Niehues,et al. The IWSLT 2015 Evaluation Campaign , 2015, IWSLT.

[42] Gerold Schneider,et al. Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis , 2013, RANLP.

[43] Lynette Hirschman,et al. Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[44] Arianna Bisazza,et al. Surveys: A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena , 2015, CL.

[45] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[46] Marcello Federico,et al. Assessing the Impact of Translation Errors on Machine Translation Quality with Mixed-effects Models , 2014, EMNLP.

[47] Stefan Riezler,et al. The Heidelberg University English-German translation system for IWSLT 2015 , 2015, IWSLT.

[48] Sara Stymne,et al. On the practice of error analysis for machine translation evaluation , 2012, LREC.

[49] Sonia Vandepitte,et al. On the origin of errors: A fine-grained analysis of MT and PE errors and their relationship , 2014, LREC.

[50] Dan I. Moldovan,et al. Semantic Representation of Negation Using Focus Detection , 2011, ACL.

[51] Marcello Federico,et al. Complexity of spoken versus written language for machine translation , 2014, EAMT.