Adversarial Subword Regularization for Robust Neural Machine Translation

Exposing diverse subword segmentations to neural machine translation (NMT) models often improves the robustness of machine translation. As NMT models experience various subword candidates, they become more robust to segmentation errors. However, the distribution of subword segmentations heavily relies on the subword language models from which erroneous segmentations of unseen words are less likely to be sampled. In this paper, we present adversarial subword regularization (ADVSR) to study whether gradient signals during training can be a substitute criterion for choosing segmentation among candidates. We experimentally show that our model-based adversarial samples effectively encourage NMT models to be less sensitive to segmentation errors and improve the robustness of NMT models in low-resource datasets.

[1]  Elena Voita,et al.  BPE-Dropout: Simple and Effective Subword Regularization , 2020, ACL.

[2]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[3]  Rico Sennrich,et al.  Domain Robustness in Neural Machine Translation , 2019, AMTA.

[4]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[5]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[6]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7]  Mike Schuster,et al.  Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Alexandre Berard,et al.  Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness Task , 2019, WMT.

[9]  Sameep Mehta,et al.  Towards Crafting Text Adversarial Samples , 2017, ArXiv.

[10]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[11]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[12]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[13]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[14]  John DeNero,et al.  Variable-Length Word Encodings for Neural Translation Models , 2015, EMNLP.

[15]  Graham Neubig,et al.  Multilingual Neural Machine Translation With Soft Decoupled Encoding , 2019, ICLR.

[16]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Omer Levy,et al.  Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation , 2019, EMNLP.

[19]  Gyuwan Kim,et al.  Subword Language Model for Query Auto-Completion , 2019, EMNLP.

[20]  Yong Cheng,et al.  Robust Neural Machine Translation with Doubly Adversarial Inputs , 2019, ACL.

[21]  Graham Neubig,et al.  On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL.

[22]  Dilin Wang,et al.  Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[23]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[24]  James Glass,et al.  Subword Regularization and Beam Search Decoding for End-to-end Automatic Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[26]  Yonatan Belinkov,et al.  Findings of the First Shared Task on Machine Translation Robustness , 2019, WMT.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Artem Sokolov,et al.  Learning to Segment Inputs for NMT Favors Character-Level Processing , 2018, IWSLT.

[31]  A. Waibel,et al.  Toward Robust Neural Machine Translation for Noisy Input Sequences , 2017, IWSLT.

[32]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[33]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[34]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[35]  Ankur Bapna,et al.  Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.

[36]  Wilker Aziz,et al.  A Latent Morphology Model for Open-Vocabulary Neural Machine Translation , 2020, ICLR.

[37]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[38]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[39]  Elizabeth Salesky,et al.  Optimizing segmentation granularity for neural machine translation , 2018, Machine Translation.

[40]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[41]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[42]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[43]  Jun Suzuki,et al.  Effective Adversarial Regularization for Neural Machine Translation , 2019, ACL.

[44]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[47]  Graham Neubig,et al.  MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.