Compositional Representation of Morphologically-Rich Input for Neural Machine Translation

Neural machine translation (NMT) models are typically trained with fixed-size input and output vocabularies, which creates an important bottleneck on their accuracy and generalization capability. As a solution, various studies proposed segmenting words into sub-word units and performing translation at the sub-lexical level. However, statistical word segmentation methods have recently shown to be prone to morphological errors, which can lead to inaccurate translations. In this paper, we propose to overcome this problem by replacing the source-language embedding layer of NMT with a bi-directional recurrent neural network that generates compositional representations of the input at any desired level of granularity. We test our approach in a low-resource setting with five languages from different morphological typologies, and under different composition assumptions. By training NMT to compose word representations from character n-grams, our approach consistently outperforms (from 1.71 to 2.48 BLEU points) NMT learning embeddings of statistically generated sub-word units.

[1]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[2]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[3]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[4]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[7]  Alexander M. Fraser,et al.  Target-side Word Segmentation Strategies for Neural Machine Translation , 2017, WMT.

[8]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[9]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[10]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Mauro Cettolo,et al.  Overview of the IWSLT 2017 Evaluation Campaign , 2017, IWSLT.

[13]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[14]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[15]  Alexander M. Fraser,et al.  Modeling Target-Side Inflection in Neural Machine Translation , 2017, WMT.

[16]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[17]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[18]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Marcello Federico,et al.  Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English , 2017, Prague Bull. Math. Linguistics.

[22]  Adam Lopez,et al.  From Characters to Words to in Between: Do We Capture Morphology? , 2017, ACL.

[23]  Mikko Kurimo,et al.  Morpho Challenge competition 2005--2010: evaluations and results , 2010, ACL 2010.

[24]  Marcello Federico,et al.  An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation , 2018, AMTA.

[25]  Jan Niehues,et al.  Pre-Translation for Neural Machine Translation , 2016, COLING.

[26]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[27]  Marcis Pinnis,et al.  Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data , 2017, TSD.