Synchronous Interactive Decoding for Multilingual Neural Machine Translation

To simultaneously translate a source language into multiple different target languages is one of the most common scenarios of multilingual translation. However, existing methods cannot make full use of translation model information during decoding, such as intra-lingual and inter-lingual future information, and therefore may suffer from some issues like the unbalanced outputs. In this paper, we present a new approach for synchronous interactive multilingual neural machine translation (SimNMT), which predicts each target language output simultaneously and interactively using historical and future information of all target languages. Specifically, we first propose a synchronous cross-interactive decoder in which generation of each target output does not only depend on its generated sequences, but also relies on its future information, as well as history and future contexts of other target languages. Then, we present a new interactive multilingual beam search algorithm that enables synchronous interactive decoding of all target languages in a single model. We take two target languages as an example to illustrate and evaluate the proposed SimNMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and MNMT models. Introduction Neural machine translation (NMT) has greatly improved the translation quality (Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2015; Gehring et al. 2017; Vaswani et al. 2017) and promoted the studies on multilingual translation. Due to the powerful end-to-end modeling capability based on the encoder-decoder framework, it is possible to handle multiple language pairs in a single model. As the deployment cost among multiple languages pairs is significantly reduced, the single model-based approach becomes a promising paradigm in multilingual NMT (MNMT). Training a single model with multiple language pairs can leverage the complementary information of different languages (Johnson et al. 2017), such as enabling zero-resource translations, or improving the quality of low-resource translations. However, existing methods cannot make full use of the information in the model during decoding. Some work Copyright c © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. (Johnson et al. 2017; Wang et al. 2018) support multilingual translation with single model, but the translation for each sentence in a batch is independent. Multi-target translation (Dong et al. 2015) supports to translate a source sentence into several different target languages through a model with one shared encoder and several different decoders. Although employing multiple decoders, the model can still handle only one language pair at each moment during decoding, which brings two problems: (i) the decoding process cannot use complementary information among different languages; (ii) for only depending on historical information without using future information, it suffers from the issue of unbalanced target language generations, i.e., the prefixes of sentences are better predicted than the suffixes (Liu et al. 2016). Several studies have explored these two issues. To exploit the multilingual complementary information, Wang et al. 2019 synchronously translates a sentence into two different target languages, and allows the generated sequences to attend to another language’s ongoing generation, to improve translation quality. However, since only historical information is adopted in the translation, it still faces the unbalanced output problem. Some studies (Liu et al. 2016; Zhang et al. 2018; Zhou, Zhang, and Zong 2019; Zhou et al. 2019; Zhang et al. 2020) alleviate this problem by introducing bidirectional decoding which provides both historical and future information during decoding. However, these works cannot support bidirectional decoding for multiple languages in a single decoder. Actually, the multilingual conversation is quite a common scenario where one sentence needs to be simultaneously translated into multiple other languages (e.g., international group chat, conversation and meeting, etc.). Therefore, it is a meaningful and promising direction to design a synchronous interactive multilingual NMT model that enables the historical and future information of different target languages to interact with each other to improve the translation performance. For example (in Figure 1), the traditional NMT model translates an English sentence into a Chinese sentence fromleft-to-right (L2R), which only depends on the historical information that has been generated (only blue box). However, for multi-target MNMT, all the forward (L2R) and backward (from-right-to-left, R2L) sequences of different target languages share the same semantics. Therefore, at each step The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[3]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Rongrong Ji,et al.  Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[6]  Lemao Liu,et al.  Agreement on Target-bidirectional Neural Machine Translation , 2016, NAACL.

[7]  Jiajun Zhang,et al.  Synchronously Generating Two Languages with Interactive Decoding , 2019, EMNLP.

[8]  Feifei Zhai,et al.  Three Strategies to Improve One-to-Many Multilingual Translation , 2018, EMNLP.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[11]  Jiajun Zhang,et al.  Synchronous Bidirectional Inference for Neural Sequence Generation , 2019, Artif. Intell..

[12]  Jiajun Zhang,et al.  Sequence Generation: From Both Sides to the Middle , 2019, IJCAI.

[13]  Jiajun Zhang,et al.  Synchronous Bidirectional Neural Machine Translation , 2019, TACL.

[14]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.