Neural Morphological Tagging of Lemma Sequences for Machine Translation

Translation to morphologically rich languages is a difficult task because of sparsity caused by morphological richness. In this work we perform a pilot study on predicting the morphologically rich POS tags of sequences of lemmas. Similar studies have been conducted in the context of phrase-based statistical machine translation. We implement a state-of-the-art tagger taking lemmas as input and show that we can successfully predict the morphologically rich POS tags, with accuracies of up to 91%.

[1]  Philipp Koehn,et al.  Using Feature Structures to Improve Verb Translation in English-to-German Statistical MT , 2014, HyTra@EACL.

[2]  Alexander M. Fraser,et al.  Joint Lemmatization and Morphological Tagging with Lemming , 2015, EMNLP.

[3]  Noah A. Smith,et al.  Translating into Morphologically Rich Languages with Synthetic Phrases , 2013, EMNLP.

[4]  Hermann Ney,et al.  A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation , 2011, Machine Translation.

[5]  Hermann Ney,et al.  Towards the Use of Word Stems and Suffixes for Statistical Machine Translation , 2004, LREC.

[6]  Katharina Kann,et al.  MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection , 2016, SIGMORPHON.

[7]  Adam Lopez,et al.  From Characters to Words to in Between: Do We Capture Morphology? , 2017, ACL.

[8]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[9]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[10]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[11]  Alexandra Birch,et al.  The Edinburgh Machine Translation Systems for IWSLT 2015 , 2015 .

[12]  Sara Stymne,et al.  Effects of Morphological Analysis in Translation between German and English , 2008, WMT@ACL.

[13]  François Yvon,et al.  Word Representations in Factored Neural Machine Translation , 2017, WMT.

[14]  Philipp Koehn,et al.  Agreement Constraints for Statistical Machine Translation into German , 2011, WMT@EMNLP.

[15]  Josef van Genabith,et al.  Neural Morphological Tagging from Characters for Morphologically Rich Languages , 2016, ArXiv.

[16]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[17]  Marcis Pinnis,et al.  Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data , 2017, TSD.

[18]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[19]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[20]  Philipp Koehn Interpolated Backoff for Factored Translation Models , 2012, AMTA.

[21]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[22]  Andy Way,et al.  Providing Morphological Information for SMT Using Neural Networks , 2017, Prague Bull. Math. Linguistics.

[23]  Alexander M. Fraser,et al.  Target-side Word Segmentation Strategies for Neural Machine Translation , 2017, WMT.

[24]  Alexander M. Fraser,et al.  Producing Unseen Morphological Variants in Statistical Machine Translation , 2017, EACL.

[25]  Fethi Bougares,et al.  Neural Machine Translation by Generating Multiple Linguistic Factors , 2017, SLSP.

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[28]  Joachim Daiber Machine Translation with Source-Predicted Target Morphology , 2015 .

[29]  Hermann Ney,et al.  Augmenting a Small Parallel Text with Morpho-Syntactic Language , 2005, ParallelText@ACL.

[30]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[31]  Fethi Bougares,et al.  Factored Neural Machine Translation , 2016, ArXiv.

[32]  François Yvon,et al.  Two-Step MT: Predicting Target Morphology , 2016 .

[33]  Ryan Cotterell,et al.  Morphological Word-Embeddings , 2019, NAACL.

[34]  Fabienne Braune,et al.  LMU Munich’s Neural Machine Translation Systems for News Articles and Health Information Texts , 2017, WMT.

[35]  François Yvon,et al.  Evaluating the morphological competence of Machine Translation Systems , 2017, WMT.

[36]  Kristina Toutanova,et al.  Applying Morphology Generation Models to Machine Translation , 2008, ACL.

[37]  Alexander M. Fraser,et al.  Modeling Inflection and Word-Formation in SMT , 2012, EACL.

[38]  Anoop Sarkar,et al.  Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction , 2011, ACL.

[39]  Ondrej Bojar,et al.  2010 Failures in English-Czech Phrase-Based MT , 2010, WMT@ACL.

[40]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[41]  Ondrej Bojar,et al.  English-to-Czech Factored Machine Translation , 2007, WMT@ACL.

[42]  Kristina Toutanova,et al.  Generating Complex Morphology for Machine Translation , 2007, ACL.

[43]  Philipp Koehn,et al.  Enriching Morphologically Poor Languages for Statistical Machine Translation , 2008, ACL.

[44]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[45]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[46]  Mark Fishel,et al.  Linguistically Motivated Unsupervised Segmentation for Machine Translation , 2010, LREC.

[47]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[48]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[49]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[50]  Marcello Federico,et al.  Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English , 2017, Prague Bull. Math. Linguistics.

[51]  Alexandre Allauzen,et al.  Non-lexical neural architecture for fine-grained POS Tagging , 2015, EMNLP.

[52]  Marta R. Costa-jussà,et al.  Morphology Generation for Statistical Machine Translation using Deep Learning Techniques , 2016, ArXiv.

[53]  Alexander M. Fraser,et al.  Modeling Target-Side Inflection in Neural Machine Translation , 2017, WMT.

[54]  Hinrich Schütze,et al.  LAMB: A Good Shepherd of Morphologically Rich Languages , 2016, EMNLP.