Quality expectations of machine translation

Machine Translation (MT) is being deployed for a range of use-cases by millions of people on a daily basis. There should, therefore, be no doubt as to the utility of MT. However, not everyone is convinced that MT can be useful, especially as a productivity enhancer for human translators. In this chapter, I address this issue, describing how MT is currently deployed, how its output is evaluated and how this could be enhanced, especially as MT quality itself improves. Central to these issues is the acceptance that there is no longer a single 'gold standard' measure of quality, such that the situation in which MT is deployed needs to be borne in mind, especially with respect to the expected 'shelf-life' of the translation itself.

[1]  Philipp Koehn,et al.  Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[2]  Jian Zhang,et al.  Experiments in Medical Translation Shared Task at WMT 2014 , 2014, WMT@ACL.

[3]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[4]  Marco Turchi,et al.  The FBK Participation in the WMT15 Automatic Post-editing Shared Task , 2015 .

[5]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[6]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[7]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[8]  Tailor-made quality-controlled translation , 2013, TC.

[9]  Yifan He,et al.  Improving the Objective Function in Minimum Error Rate Training , 2009, MTSUMMIT.

[10]  Andy Way,et al.  Comparing Translator Acceptability of TM and SMT Outputs , 2016, EAMT.

[11]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[12]  Andy Way,et al.  Declarative Evaluation of an MT system: Practical Experiences , 1991 .

[13]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[14]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[15]  Alon Lavie,et al.  Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output , 2008, WMT@ACL.

[16]  John S. White,et al.  Task-Based Evaluation for Machine Translation , 1999 .

[17]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[18]  Alexandra Birch,et al.  The Edinburgh Machine Translation Systems for IWSLT 2015 , 2015 .

[19]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[20]  Nadira Hofmann MT-enhanced fuzzy matching with Transit NXT and STAR Moses , 2015, EAMT.

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[23]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[24]  Jörg Tiedemann,et al.  Climbing Mont BLEU: The Strange World of Reachable High-BLEU Translations , 2016, EAMT.

[25]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[26]  Alon Lavie,et al.  A framework for interactive and automatic refinement of transfer-based machine translation , 2005, EAMT.

[27]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28]  Marta R. Costa-jussà,et al.  Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations , 2012, J. Assoc. Inf. Sci. Technol..

[29]  Khalil Sima'an,et al.  ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task , 2016, WMT.

[30]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[31]  Lorna Balkan,et al.  Test Suites for Natural Language Processing , 1995, TC.

[32]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[33]  Stefan Riezler,et al.  The Heidelberg University English-German translation system for IWSLT 2015 , 2015, IWSLT.

[34]  Ming Zhou,et al.  Sentence Level Machine Translation Evaluation as a Ranking , 2007, WMT@ACL.

[35]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[36]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[37]  Luciana Graziuso,et al.  Is That a Fish In Your Ear? ─ Translation and the Meaning of Everything , 2012 .

[38]  Andy Way,et al.  Labelled Dependencies in Machine Translation Evaluation , 2007, WMT@ACL.

[39]  Margaret King,et al.  Using Test Suites in Evaluation of Machine Translation Systems , 1990, COLING.

[40]  Rebecca Hwa,et al.  Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[41]  John R. Pierce,et al.  Language and Machines: Computers in Translation and Linguistics , 1966 .

[42]  Andy Way Machine translation: Where are we at today? , 2020 .

[43]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[44]  Marc Dymetman,et al.  Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[45]  Dragos Ciobanu,et al.  Traditional and Emerging Use-Cases for Machine Translation , 2013 .

[46]  Doug Arnold,et al.  Machine Translation: An Introductory Guide , 1994 .

[47]  Qun Liu,et al.  A discriminative framework of integrating translation memory features into SMT , 2014, AMTA.

[48]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[49]  David Bellos,et al.  Is That a Fish in Your Ear?: Translation and the Meaning of Everything , 2011 .

[50]  Louisa Sadler,et al.  Automatic Test Suite generation , 2004, Machine Translation.

[51]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[52]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[53]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[54]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[55]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[56]  Yifan He,et al.  Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach , 2011, ACL.

[57]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[58]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[59]  Yifan He,et al.  Learning Labelled Dependencies in Machine Translation Evaluation , 2009, EAMT.

[60]  Kavita Thomas Designing a Task-Based Evaluation Methodology for a Spoken Machine Translation System , 1999, ACL.

[61]  Dimitar Shterionov,et al.  Human versus automatic quality evaluation of NMT and PBSMT , 2018, Machine Translation.

[62]  Chengqing Zong,et al.  Integrating Translation Memory into Phrase-Based Machine Translation during Decoding , 2013, ACL.

[63]  Josef van Genabith,et al.  Integrating N-best SMT Outputs into a TM System , 2010, COLING.

[64]  Bogdan Babych,et al.  Extending the BLEU MT Evaluation Method with Frequency Weightings , 2004, ACL.

[65]  Giselle de Almeida,et al.  Translating the post-editor: an investigation of post-editing changes and correlations with professional experience across two Romance languages , 2013 .

[66]  Yifan He,et al.  Bridging SMT and TM with Translation Recommendation , 2010, ACL.

[67]  Andy Way David Bellos (ed): Is that a fish in your ear: translation and the meaning of everything , 2012, Machine Translation.

[68]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[69]  Jan Niehues,et al.  The KIT translation systems for IWSLT 2015 , 2015, IWSLT.

[70]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[71]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[72]  Andy Way,et al.  A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints , 2011, MTSUMMIT.

[73]  Clare R. Voss,et al.  Task-based Evaluation of Machine Translation (MT) Engines. Measuring How Well People Extract Who, When, Where-Type Elements in MT Output , 2006, EAMT.

[74]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[75]  Harold L. Somers,et al.  Round-trip Translation: What Is It Good For? , 2005, ALTA.

[76]  William Lewis,et al.  Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge , 2013, HyTra@ACL.

[77]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[78]  Yifan He,et al.  Metric and reference factors in minimum error rate training , 2010, Machine Translation.