Synthetic and Natural Noise Both Break Neural Machine Translation

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

[1]  G. M. Reicher Perceptual recognition as a function of meaninfulness of stimulus material. , 1969, Journal of experimental psychology.

[2]  Randolph G. Bias,et al.  Word recognition inside out and outside in , 1981 .

[3]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Christopher M. Bishop,et al.  Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.

[5]  G. Humphreys,et al.  Disruption to word or letter processing? The origins of case-mixing effects. , 1997, Journal of experimental psychology. Learning, memory, and cognition.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  K. Saberi,et al.  Cognitive restoration of reversed speech , 1999, Nature.

[8]  Denis G. Pelli,et al.  The remarkable inefficiency of word recognition , 2003, Nature.

[9]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[10]  Graham Rawlinson,et al.  The Significance of Letter Position in Word Recognition , 2007, IEEE Aerospace and Electronic Systems Magazine.

[11]  Salvatore J. Stolfo,et al.  Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[13]  Guillaume Wisniewski,et al.  Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History , 2022, LREC.

[14]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[15]  Torsten Zesch,et al.  Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History , 2012, EACL.

[16]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[17]  Walt Detmar Meurers,et al.  MERLIN : An Online Trilingual Learner Corpus Empirically Grounding the European Reference Levels in Authentic Learner Data , 2013 .

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[20]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[24]  Marcin Junczys-Dowmunt,et al.  The University of Edinburgh’s systems submission to the MT task at IWSLT , 2018, IWSLT.

[25]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[26]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[27]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[28]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[29]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[30]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[31]  Yonatan Belinkov,et al.  Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results , 2016, ArXiv.

[32]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[33]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[34]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[35]  Milan Straka,et al.  CzeSL Grammatical Error Correction Dataset (CzeSL-GEC) , 2017 .

[36]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[37]  Yonatan Belinkov,et al.  Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder , 2017, IJCNLP.

[38]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[39]  Ekaterina Vylomova,et al.  Word Representation Models for Morphologically Rich Languages in Neural Machine Translation , 2016, SWCN@EMNLP.

[40]  Kevin Duh,et al.  Robsut Wrod Reocginiton via Semi-Character Recurrent Neural Network , 2016, AAAI.

[41]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[42]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[43]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[44]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[45]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for NLP , 2017, ArXiv.

[46]  Nina Narodytska,et al.  Simple Black-Box Adversarial Attacks on Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Sameep Mehta,et al.  Towards Crafting Text Adversarial Samples , 2017, ArXiv.

[48]  Yonatan Belinkov,et al.  Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging , 2017, ACL.

[49]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[50]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[51]  Josef van Genabith,et al.  How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse? , 2017, AMTA.

[52]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[53]  Patrick P. K. Chan,et al.  Data sanitization against adversarial label contamination based on data complexity , 2018, Int. J. Mach. Learn. Cybern..

[54]  Xirong Li,et al.  Deep Text Classification Can be Fooled , 2017, IJCAI.

[55]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).