MiSS: An Assistant for Multi-Style Simultaneous Translation

In this paper, we present MiSS, an assistant for multi-style simultaneous translation. Our proposed translation system has five key features: highly accurate translation, simultaneous translation, translation for multiple text styles, back-translation for translation quality evaluation, and grammatical error correction. With this system, we aim to provide a complete translation experience for machine translation users. Our design goals are high translation accuracy, real-time translation, flexibility, and measurable translation quality. Compared with the free commercial translation systems commonly used, our translation assistance system regards the machine translation application as a more complete and fully-featured tool for users. By incorporating additional features and giving the user better control over their experience, we improve translation efficiency and performance. Additionally, our assistant system combines machine translation, grammatical error correction, and interactive edits, and uses a crowdsourcing mode to collect more data for further training to improve both the machine translation and grammatical error correction models. A short video demonstrating our system is available at https://www.youtube.com/watch?v=ZGCo7KtRKd8.

[1]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[2]  Francisco Casacuberta,et al.  Interactive neural machine translation , 2017, Comput. Speech Lang..

[3]  Shafiq R. Joty,et al.  Cross-model Back-translated Distillation for Unsupervised Machine Translation , 2020, ICML.

[4]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[7]  Kevin Parnow,et al.  Grammatical Error Correction as GAN-like Sequence Labeling , 2021, FINDINGS.

[8]  Carol Pfaff Constraints on Language Mixing: Intrasentential Code-Switching and Borrowing in Spanish/English , 1979 .

[9]  Antonios Anastasopoulos,et al.  An Analysis of Source-Side Grammatical Errors in NMT , 2019, BlackboxNLP@ACL.

[10]  Hai Zhao,et al.  Reference Language based Unsupervised Neural Machine Translation , 2020, FINDINGS.

[11]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[12]  Helen Yannakoudakis,et al.  Compositional Sequence Labeling Models for Error Detection in Learner Writing , 2016, ACL.

[13]  Hai Zhao,et al.  Explicit Sentence Compression for Neural Machine Translation , 2019, AAAI.

[14]  Josef van Genabith,et al.  How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse? , 2017, AMTA.

[15]  Tie-Yan Liu,et al.  Learn to Use Future Information in Simultaneous Translation , 2020, ArXiv.

[16]  Shana Poplack,et al.  Sometimes I'll Start a Sentence in Spanish Y Termino En Espanol: toward a Typology of Code-switching 1 , 2010 .

[17]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[18]  Zhao Hai,et al.  Semantics-aware BERT for Language Understanding , 2019, AAAI.

[19]  Germán Sanchis-Trilles,et al.  CASMACAT: A Computer-assisted Translation Workbench , 2014, EACL.

[20]  Hai Zhao,et al.  Data-dependent Gaussian Prior Objective for Language Generation , 2020, ICLR.

[21]  Hai Zhao,et al.  A Unified Syntax-aware Framework for Semantic Role Labeling , 2018, EMNLP.

[22]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[23]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[24]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[25]  Meng Sun,et al.  Baidu Neural Machine Translation Systems for WMT19 , 2019, WMT.

[26]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[27]  Lucia Specia,et al.  Unsupervised Quality Estimation for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[28]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[29]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[30]  Haifeng Wang,et al.  STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework , 2018, ACL.

[31]  Sunita Sarawagi,et al.  Parallel Iterative Edit Models for Local Sequence Transduction , 2019, EMNLP.

[32]  Artem Chernodub,et al.  GECToR – Grammatical Error Correction: Tag, Not Rewrite , 2020, BEA.

[33]  Jingbo Zhu,et al.  Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[34]  Hai Zhao,et al.  Seq2seq Dependency Parsing , 2018, COLING.

[35]  Monojit Choudhury,et al.  INMT: Interactive Neural Machine Translation Prediction , 2019, EMNLP.

[36]  Hai Zhao,et al.  Grammatical Error Correction: More Data with More Context , 2020, 2020 International Conference on Asian Language Processing (IALP).

[37]  Hai Zhao,et al.  Moon IME: Neural-based Chinese Pinyin Aided Input Method with Customizable Association , 2018, ACL.

[38]  Hai Zhao,et al.  Text Compression-Aided Transformer Encoding , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[40]  Dorothée Behr,et al.  Assessing the use of back translation: the shortcomings of back translation as a quality testing method , 2017 .

[41]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[42]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[43]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[44]  Liu Yan Style in Translation , 2004 .

[45]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[46]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[47]  Philipp Koehn,et al.  Neural Interactive Translation Prediction , 2016, AMTA.