Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional, which naturally derive a pair of directional tasks and two directional learning signals. However, typical seq2seq neural networks are simplex that only model one unidirectional task, which cannot fully exploit the potential of bidirectional learning signals from parallel data. To address this issue, we propose a duplex seq2seq neural network, REDER (REversible Duplex TransformER), and apply it to machine translation. The architecture of REDER has two ends, each of which specializes in a language so as to read and yield sequences in that language. As a result, REDER can simultaneously learn from the bidirectional signals, and enables reversible machine translation by simply flipping the input and output ends, Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation, which helps obtain considerable gains over several strong baselines.

[1]  Graham Neubig,et al.  A Probabilistic Formulation of Unsupervised Text Style Transfer , 2020, ICLR.

[2]  Yu Bao,et al.  Non-autoregressive Transformer by Position Learning , 2019, ArXiv.

[3]  Gertjan van Noord,et al.  Reversible Unification Based Machine Translation , 1990, COLING.

[4]  Jungo Kasai,et al.  Non-autoregressive Machine Translation with Disentangled Context Transformer , 2020, ICML.

[5]  Kyunghyun Cho,et al.  Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior , 2019, AAAI.

[6]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[7]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[8]  Hao Zhou,et al.  Imitation Learning for Non-Autoregressive Neural Machine Translation , 2019, ACL.

[9]  Yu Bao,et al.  Glancing Transformer for Non-Autoregressive Neural Machine Translation , 2020, ArXiv.

[10]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[11]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[12]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[13]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[16]  Wilker Aziz,et al.  Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation , 2020, COLING.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Nenghai Yu,et al.  Model-Level Dual Learning , 2018, ICML.

[19]  Zaixiang Zheng,et al.  Mirror-Generative Neural Machine Translation , 2020, ICLR.

[20]  Jakob Uszkoreit,et al.  KERMIT: Generative Insertion-Based Modeling for Sequences , 2019, ArXiv.

[21]  Mohammad Norouzi,et al.  Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.

[22]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[25]  Eduard Hovy,et al.  FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow , 2019, EMNLP.

[26]  Lili Mou,et al.  Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision , 2021, AAAI.

[27]  Ankur Bapna,et al.  Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation , 2021, ICLR.

[28]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[29]  Eric P. Xing,et al.  Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.

[30]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[31]  Yang Liu,et al.  Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation , 2015, IJCAI.

[32]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[33]  Jiatao Gu,et al.  Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade , 2020, FINDINGS.

[34]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[35]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[36]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[37]  Omer Levy,et al.  Semi-Autoregressive Training Improves Mask-Predict Decoding , 2020, ArXiv.

[38]  Jindrich Libovický,et al.  End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[39]  Nenghai Yu,et al.  Dual Supervised Learning , 2017, ICML.

[40]  Daniel E. Worrall,et al.  Reversible GANs for Memory-Efficient Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Rico Sennrich,et al.  Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[42]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[43]  Tomek Strzalkowski,et al.  Reversible Grammar in Natural Language Processing , 1993 .

[44]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[45]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[46]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[47]  Roger B. Grosse,et al.  Reversible Recurrent Neural Networks , 2018, NeurIPS.

[48]  Gordon Franck Reversible grammars and natural language processing , 1992, SAC '92.

[49]  Di He,et al.  Non-Autoregressive Machine Translation with Auxiliary Regularization , 2019, AAAI.

[50]  Tie-Yan Liu,et al.  Hint-Based Training for Non-Autoregressive Machine Translation , 2019, EMNLP.

[51]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[52]  Graham Neubig,et al.  Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2020, ICLR.

[53]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[54]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.