Neural Machine Translation of Rare Words with Subword Units

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.1 and 1.3 BLEU, respectively.

[1]  Franklin Mark Liang Word hy-phen-a-tion by com-put-er , 1983 .

[2]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[3]  Hermann Ney,et al.  Improving SMT quality with morpho-syntactic analysis , 2000, COLING.

[4]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[5]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[6]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[7]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Mathias Creutz,et al.  Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[10]  Hermann Ney,et al.  Can We Translate Letters? , 2007, WMT@ACL.

[11]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[12]  Jörg Tiedemann,et al.  Character-Based PSMT for Closely Related Languages , 2009, EAMT.

[13]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[14]  Regina Barzilay,et al.  Unsupervised Morphology Rivals Supervised Morphology for Arabic MT , 2012, ACL.

[15]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[16]  Taro Watanabe,et al.  Machine Translation without Words through Substring Alignment , 2012, ACL.

[17]  Jörg Tiedemann,et al.  Character-Based Pivot Translation for Under-Resourced Languages and Domains , 2012, EACL.

[18]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[19]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[20]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[21]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[22]  Raju Uma,et al.  A New Algorithm For Data Compression , 2013 .

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Nadir Durrani,et al.  Integrating an Unsupervised Transliteration Model into Statistical Machine Translation , 2014, EACL.

[27]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[28]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[29]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[30]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[31]  Ondrej Bojar,et al.  Results of the WMT13 Metrics Shared Task , 2015, WMT@EMNLP.

[32]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[33]  Rico Sennrich,et al.  A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation , 2015, EMNLP.

[34]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[35]  John DeNero,et al.  Variable-Length Word Encodings for Neural Translation Models , 2015, EMNLP.

[36]  Philipp Koehn,et al.  The Edinburgh/JHU Phrase-based Machine Translation Systems for WMT 2015 , 2015, WMT@EMNLP.

[37]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[38]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[39]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[40]  Jiameng Gao Variable length word encodings for neural translation models , 2016 .