Progress in Machine Translation

Abstract After more than 70 years of evolution, great achievements have been made in machine translation. Especially in recent years, translation quality has been greatly improved with the emergence of neural machine translation (NMT). In this article, we first review the history of machine translation from rule-based machine translation to example-based machine translation and statistical machine translation. We then introduce NMT in more detail, including the basic framework and the current dominant framework, Transformer, as well as multilingual translation models to deal with the data sparseness problem. In addition, we introduce cutting-edge simultaneous translation methods that achieve a balance between translation quality and latency. We then describe various products and applications of machine translation. At the end of this article, we briefly discuss challenges and future research directions in this field.

[1]  Satoshi Nakamura,et al.  Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework , 2019 .

[2]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[3]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[4]  Guodong Zhou,et al.  Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[5]  Francisco Casacuberta,et al.  A Quantitative Method for Machine Translation Evaluation , 2003 .

[6]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[7]  Haifeng Wang,et al.  DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting , 2019, ArXiv.

[8]  Alex Waibel,et al.  JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Tao Qin,et al.  Multilingual Neural Machine Translation with Language Clustering , 2019, EMNLP.

[10]  Haifeng Wang,et al.  STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework , 2018, ACL.

[11]  Barbara Moser-Mercer,et al.  Prolonged turns in interpreting: effects on quality, physiological and psychological stress (Pilot study) , 1998 .

[12]  Min Zhang,et al.  Neural Machine Translation Advised by Statistical Machine Translation , 2016, AAAI.

[13]  Andrej Ljolje,et al.  Segmentation Strategies for Streaming Speech Translation , 2013, HLT-NAACL.

[14]  Eneko Agirre,et al.  An Effective Approach to Unsupervised Machine Translation , 2019, ACL.

[15]  Y Kato The future of voice-processing technology in the world of computers and communications. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[17]  Furu Wei,et al.  LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[20]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[21]  Wang Hai UTTERANCE SEGMENTATION OF SPOKEN CHINESE , 1999 .

[22]  Adam Lopez,et al.  Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.

[23]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[24]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[25]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[26]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[27]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[28]  Navdeep Jaitly,et al.  Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.

[29]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[30]  Hiroaki Kitano,et al.  Speech--to--Speech Translation , 1993 .

[31]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[32]  Satoshi Nakamura,et al.  Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation , 2017, INTERSPEECH.

[33]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[34]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[35]  Enhong Chen,et al.  Chinese Poetry Generation with Planning based Neural Network , 2016, COLING.

[36]  Satoshi Nakamura,et al.  Transformer-Based Direct Speech-To-Speech Translation with Transcoder , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).

[37]  Hua Wu,et al.  Revisiting Pivot Language Approach for Machine Translation , 2009, ACL.

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[39]  Graham Neubig,et al.  Learning to Translate in Real-time with Neural Machine Translation , 2016, EACL.

[40]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[42]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[43]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[44]  Hao Zhou,et al.  Imitation Learning for Non-Autoregressive Neural Machine Translation , 2019, ACL.

[45]  Yang Feng,et al.  Memory-augmented Neural Machine Translation , 2017, EMNLP.

[46]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[47]  Satoshi Nakamura,et al.  NICT-ATR Speech-to-Speech Translation System , 2007, ACL.

[48]  Ankur Bapna,et al.  Investigating Multilingual NMT Representations at Scale , 2019, EMNLP.

[49]  Wei Li,et al.  Monotonic Infinite Lookback Attention for Simultaneous Machine Translation , 2019, ACL.

[50]  Satoshi Nakamura,et al.  Incremental TTS for Japanese Language , 2018, INTERSPEECH.

[51]  Jiajun Zhang,et al.  Phrase Table as Recommendation Memory for Neural Machine Translation , 2018, IJCAI.

[52]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[53]  Zhongjun He,et al.  Learning Adaptive Segmentation Policy for Simultaneous Translation , 2020, EMNLP.

[54]  Jiajun Zhang,et al.  Neural machine translation: Challenges, progress and future , 2020, Science China Technological Sciences.

[55]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[56]  Sergei Nirenburg,et al.  ALPAC: The (In)Famous Report , 2003 .

[57]  Mattia Antonino Di Gangi,et al.  MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.

[58]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[59]  Tomoki Toda,et al.  Preserving Word-Level Emphasis in Speech-to-Speech Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[60]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[61]  Tomoki Toda,et al.  Simple, lexicalized choice of translation timing for simultaneous speech translation , 2013, INTERSPEECH.

[62]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[63]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[64]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[65]  Jiajun Zhang,et al.  End-to-End Speech Translation with Knowledge Distillation , 2019, INTERSPEECH.

[66]  Timo Baumann Partial representations improve the prosody of incremental speech synthesis , 2014, INTERSPEECH.

[67]  Panayiotis G. Georgiou,et al.  Toward transfer of acoustic cues of emphasis across languages , 2013, INTERSPEECH.

[68]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[69]  Ming Zhou,et al.  Explicit Cross-lingual Pre-training for Unsupervised Machine Translation , 2019, EMNLP.

[70]  Tomoki Toda,et al.  Optimizing Segmentation Strategies for Simultaneous Speech Translation , 2014, ACL.

[71]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[72]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Jiajun Zhang,et al.  Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding , 2019, AAAI.

[74]  Richard Sproat,et al.  Efficient grammar processing for a spoken language translation system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[75]  Shigeki Matsubara,et al.  CIAIR Simultaneous Interpretation Corpus , 2004 .

[76]  Satoshi Nakamura,et al.  The ATR Multilingual Speech-to-Speech Translation System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[77]  Alex Waibel,et al.  Open Domain Speech Translation: From Seminars and Speeches to Lectures , 2006 .

[78]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[79]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[80]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81]  David Chiang,et al.  Tied Multitask Learning for Neural Speech Translation , 2018, NAACL.

[82]  Gérard Bailly,et al.  Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis , 2016, INTERSPEECH.

[83]  Ming Zhou,et al.  Improved Neural Machine Translation with Source Syntax , 2017, IJCAI.

[84]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[85]  Yang Liu,et al.  Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation , 2015, IJCAI.

[86]  Renjie Zheng,et al.  Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework , 2020, EMNLP.

[87]  Andy Way,et al.  Investigating Backtranslation in Neural Machine Translation , 2018, EAMT.

[88]  Timo Baumann Decision tree usage for incremental parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[89]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[90]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[91]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[92]  北野 宏明,et al.  Speech-to-speech translation : a massively parallel memory-based approach , 1994 .

[93]  Hua Wu,et al.  Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs , 2006, ACL.

[94]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[95]  Adam Lopez,et al.  Low-Resource Speech-to-Text Translation , 2018, INTERSPEECH.

[96]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[97]  Graham Neubig,et al.  Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2020, ICLR.

[98]  David Chiang,et al.  Leveraging translations for speech transcription in low-resource settings , 2018, INTERSPEECH.

[99]  Gérard Bailly,et al.  HMM training strategy for incremental speech synthesis , 2015, INTERSPEECH.

[100]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[101]  Enhong Chen,et al.  Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2019, AAAI.

[102]  He He,et al.  Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation , 2016, NAACL.

[103]  Olivier Pietquin,et al.  Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.

[104]  Tomoki Toda,et al.  Collection of a Simultaneous Translation Corpus for Comparative Analysis , 2014, LREC.

[105]  Kyunghyun Cho,et al.  Can neural machine translation do simultaneous translation? , 2016, ArXiv.

[106]  Juan Pino,et al.  Monotonic Multihead Attention , 2019, ICLR.

[107]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[108]  Nadir Durrani,et al.  Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation , 2018, NAACL.

[109]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[110]  Claudio Bendazzoli,et al.  Tagging a Corpus of Interpreted Speeches: the European Parliament Interpreting Corpus (EPIC) , 2006, LREC.

[111]  Satoshi Nakamura,et al.  Sequence-to-Sequence Models for Emphasis Speech Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[112]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[113]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[114]  Melvin Johnson,et al.  Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.

[115]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[116]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[117]  Matthias Sperber,et al.  Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation , 2019, TACL.

[118]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.