Morphological Analysis Using a Sequence Decoder

We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and to outperform whole-tag models. In addition, generating morphological features as a sequence rather than, for example, an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the-art results in nine languages of different morphological complexity under low-resource, high-resource, and transfer learning settings. We also introduce TrMor2018, a new high-accuracy Turkish morphology data set. Our Morse implementation and the TrMor2018 data set are available online to support future research.1See https://github.com/ai-ku/Morse.jl for a Morse implementation in Julia/Knet (Yuret, 2016) and https://github.com/ai-ku/TrMor2018 for the new Turkish data set.

[1]  Kemal Oflazer Morphological Processing for Turkish , 2018 .

[2]  Julia Deniz Yuret Knet : beginning deep learning with 100 lines of , 2016 .

[3]  Kemal Oflazer,et al.  Statistical Dependency Parsing for Turkish , 2006, EACL.

[4]  Kemal Oflazer,et al.  Dependency Parsing of Turkish , 2008, CL.

[5]  Aibek Makazhanov,et al.  Character-Aware Neural Morphological Disambiguation , 2017, ACL.

[6]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[7]  Gökhan Tür,et al.  Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation , 1996, EMNLP.

[8]  Ryan Cotterell,et al.  Cross-lingual Character-Level Neural Morphological Tagging , 2017, EMNLP.

[9]  Josef van Genabith,et al.  An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages , 2017, EACL.

[10]  Ilyas Cicekli,et al.  A Rule-Based Morphological Disambiguator for Turkish , 2007 .

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  Murat Saraclar,et al.  Morphological Disambiguation of Turkish Text with Perceptron Algorithm , 2009, CICLing.

[13]  Josef van Genabith,et al.  Learning Morphology with Morfette , 2008, LREC.

[14]  Grzegorz Chrupala,et al.  Simple Data-Driven Context-Sensitive Lemmatization , 2006, Proces. del Leng. Natural.

[15]  Caglar Tirkaz,et al.  A Morphology-Aware Network for Morphological Disambiguation , 2016, AAAI.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Alexander M. Fraser,et al.  Joint Lemmatization and Morphological Tagging with Lemming , 2015, EMNLP.

[18]  Daoud Daoud Synchronized Morphological and Syntactic Disambiguation for Arabic , 2009 .

[19]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[20]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[21]  Kimmo Koskenniemi,et al.  Two-Level Model for Morphological Analysis , 1983, IJCAI.

[22]  Graham Neubig,et al.  Neural Factor Graph Models for Cross-lingual Morphological Tagging , 2018, ACL.

[23]  Deniz Yuret,et al.  Learning Morphological Disambiguation Rules for Turkish , 2006, NAACL.

[24]  Kemal Oflazer,et al.  Tagging and Morphological Disambiguation of Turkish Text , 1994, ANLP.

[25]  Chris Dyer,et al.  The Role of Context in Neural Morphological Disambiguation , 2016, COLING.

[26]  Jan Hajič,et al.  The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech , 2007, ACL 2007.

[27]  Kemal Oflazer,et al.  Morphological Disambiguation for Turkish , 2018 .

[28]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[29]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.