What do Neural Machine Translation Models Learn about Morphology?

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.

[1]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[2]  Xuanjing Huang,et al.  Investigating Language Universal and Specific Properties in Word Embeddings , 2016, ACL.

[3]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[4]  Antonio Toral,et al.  A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[5]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Nadir Durrani,et al.  Improving machine translation via triangulation and transliteration , 2014, EAMT.

[8]  Preslav Nakov,et al.  A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages , 2010, EMNLP.

[9]  Xuanjing Huang,et al.  Analyzing Linguistic Knowledge in Sequential Model of Sentence , 2016, EMNLP.

[10]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[11]  James R. Glass,et al.  Segmentation for English-to-Arabic Statistical Machine Translation , 2008, ACL.

[12]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[15]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[16]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[17]  Preslav Nakov,et al.  Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages , 2012, ACL.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[19]  Wei Xu,et al.  Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[20]  Mauro Cettolo An Arabic-Hebrew parallel corpus of TED talks , 2016, ArXiv.

[21]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[22]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[23]  Helmut Schmid,et al.  LoPar: Design and Implementation , 2000 .

[24]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[25]  Nadir Durrani,et al.  Hindi-to-Urdu Machine Translation through Transliteration , 2010, ACL.

[26]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[27]  Ekaterina Vylomova,et al.  Word Representation Models for Morphologically Rich Languages in Neural Machine Translation , 2016, SWCN@EMNLP.

[28]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[29]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[30]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[31]  Nadir Durrani,et al.  Investigating the Usefulness of Generalized Word Representations in SMT , 2014, COLING.

[32]  Nadir Durrani,et al.  The Operation Sequence Model—Combining N-Gram-Based and Phrase-Based Statistical Machine Translation , 2015, CL.

[33]  Hermann Ney,et al.  Improving SMT quality with morpho-syntactic analysis , 2000, COLING.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[36]  Yonatan Belinkov,et al.  Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging , 2017, ACL.

[37]  Grzegorz Chrupala,et al.  From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning , 2016, COLING.

[38]  Grzegorz Chrupala,et al.  Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[39]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[40]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[41]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[42]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[43]  Nadir Durrani,et al.  Integrating an Unsupervised Transliteration Model into Statistical Machine Translation , 2014, EACL.

[44]  Arne Köhn,et al.  What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation , 2015, EMNLP.

[45]  Yonatan Belinkov,et al.  Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results , 2016, ArXiv.