The Role of Context in Neural Morphological Disambiguation

Languages with rich morphology often introduce sparsity in language processing tasks. While morphological analyzers can reduce this sparsity by providing morpheme-level analyses for words, they will often introduce ambiguity by returning multiple analyses for the same surface form. The problem of disambiguating between these morphological parses is further complicated by the fact that a correct parse for a word is not only be dependent on the surface form but also on other words in its context. In this paper, we present a language-agnostic approach to morphological disambiguation. We address the problem of using context in morphological disambiguation by presenting several LSTM-based neural architectures that encode long-range surface-level and analysis-level contextual dependencies. We applied our approach to Turkish, Russian, and Arabic to compare effectiveness across languages, matching state-of-the-art results in two of the three languages. Our results also demonstrate that while context plays a role in learning how to disambiguate, the type and amount of context needed varies between languages.

[1]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[2]  Gökhan Tür,et al.  Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation , 1996, EMNLP.

[3]  David A. Smith,et al.  A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing , 2011, ACL.

[4]  Deniz Yuret,et al.  Learning Morphological Disambiguation Rules for Turkish , 2006, NAACL.

[5]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[6]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[7]  Murat Saraclar,et al.  Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus , 2008, GoTAL.

[8]  Kemal Oflazer Two-level description of Turkish morphology , 1993 .

[9]  Daoud Daoud Synchronized Morphological and Syntactic Disambiguation for Arabic , 2009 .

[10]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11]  Murat Saraclar,et al.  Morphological Disambiguation of Turkish Text with Perceptron Algorithm , 2009, CICLing.

[12]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[13]  Noah A. Smith,et al.  Context-Based Morphological Disambiguation with Random Fields , 2005, HLT.

[14]  Mikhail Korobov,et al.  Morphological Analyzer and Generator for Russian and Ukrainian Languages , 2015, AIST.

[15]  Ilyas Cicekli,et al.  A Rule-Based Morphological Disambiguator for Turkish , 2007 .

[16]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[17]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[18]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[19]  Jan Hajič,et al.  The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech , 2007, ACL 2007.

[20]  Caglar Tirkaz,et al.  A Morphology-Aware Network for Morphological Disambiguation , 2016, AAAI.

[21]  Maria Stepanova,et al.  Quality assurance tools in the OpenCorpora project , 2011 .