A Morphology-Aware Network for Morphological Disambiguation

Agglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological analysis of a word. Morphological disambiguation is important because it generally is one of the first steps of natural language processing and its performance affects subsequent analyses. In this paper, we propose a system that uses deep learning techniques for morphological disambiguation. Many of the state-of-the-art results in computer vision, speech recognition and natural language processing have been obtained through deep learning models. However, applying deep learning techniques to morphologically rich languages is not well studied. In this work, while we focus on Turkish morphological disambiguation we also present results for French and German in order to show that the proposed architecture achieves high accuracy with no language-specific feature engineering or additional resource. In the experiments, we achieve 84.12 , 88.35 and 93.78 morphological disambiguation accuracy among the ambiguous words for Turkish, German and French respectively.

[1]  Tie-Yan Liu,et al.  KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge , 2014, TOIS.

[2]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[3]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[4]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[5]  Reut Tsarfaty,et al.  Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages , 2014 .

[6]  Murat Saraclar,et al.  Morphological Disambiguation of Turkish Text with Perceptron Algorithm , 2009, CICLing.

[7]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[8]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .

[9]  Kemal Oflazer,et al.  Tagging and Morphological Disambiguation of Turkish Text , 1994, ANLP.

[10]  Noah A. Smith,et al.  Knowledge-Rich Morphological Priors for Bayesian Language Models , 2013, NAACL.

[11]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[12]  Gökhan Tür,et al.  Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation , 1996, EMNLP.

[13]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[14]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[15]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[16]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[17]  Esref Adali,et al.  Disambiguating Main POS tags for Turkish , 2012, ROCLING.

[18]  Beáta Megyesi,et al.  Improving Brill’s POS Tagger for an Agglutinative Language , 1999, EMNLP.

[19]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[20]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Ilyas Cicekli,et al.  A Hybrid Morphological Disambiguation System for Turkish , 2013, IJCNLP.

[25]  Ryan Cotterell,et al.  Morphological Word-Embeddings , 2019, NAACL.

[26]  Ilyas Cicekli,et al.  A Rule-Based Morphological Disambiguator for Turkish , 2007 .

[27]  Olcay Taner Yildiz,et al.  A Novel Approach to Morphological Disambiguation for Turkish , 2011, ISCIS.

[28]  Attila Novák,et al.  PurePos 2.0: a hybrid tool for morphological disambiguation , 2013, RANLP.

[29]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[31]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[32]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[33]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[34]  Deniz Yuret,et al.  Learning Morphological Disambiguation Rules for Turkish , 2006, NAACL.

[35]  Lauri Karttunen,et al.  Finite State Morphology , 2003, CSLI Studies in Computational Linguistics.

[36]  Tommi A. Pirinen,et al.  HFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers , 2009, SFCM.

[37]  Jan Hajic,et al.  Czech language processing, POS tagging , 1998, LREC.

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[40]  J. M. Arriola,et al.  Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages , 1998, ACL.

[41]  Christian Chiarcos,et al.  A New Hybrid Dependency Parser for German , 2009 .