What do character-level models learn about morphology? The case of dependency parsing

When parsing morphologically-rich languages with neural models, it is beneficial to model input at the character level, and it has been claimed that this is because character-level models learn morphology. We test these claims by comparing character-level models to an oracle with access to explicit morphological analysis on twelve languages with varying morphological typologies. Our results highlight many strengths of character-level models, but also show that they are poor at disambiguating some words, particularly in the face of case syncretism. We then demonstrate that explicitly modeling morphological case improves our best model, showing that character-level models can benefit from targeted forms of explicit morphological modeling.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Matthew Baerman,et al.  The Syntax-Morphology Interface: A Study of Syncretism , 2009 .

[3]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[4]  Miguel Ballesteros,et al.  Effective Morphological Feature Selection with MaltOptimizer at the SPMRL 2013 Shared Task , 2013, SPMRL@EMNLP.

[5]  Jonas Kuhn,et al.  Morphological and Syntactic Case in Statistical Dependency Parsing , 2013, Computational Linguistics.

[6]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[7]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[8]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[9]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[10]  Xuanjing Huang,et al.  Investigating Language Universal and Specific Properties in Word Embeddings , 2016, ACL.

[11]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[12]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[13]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[14]  Maximin Coavoux,et al.  Multilingual Lexicalized Constituency Parsing with Word-Level Auxiliary Tasks , 2017, EACL.

[15]  Xiang Yu,et al.  IMS at the CoNLL 2017 UD Shared Task: CRFs and Perceptrons Meet Neural Networks , 2017, CoNLL.

[16]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[17]  Yao Cheng,et al.  Combining Global Models for Parsing Universal Dependencies , 2017, CoNLL.

[18]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[19]  Mirella Lapata,et al.  Dependency Parsing as Head Selection , 2016, EACL.

[20]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[21]  Adam Lopez,et al.  From Characters to Words to in Between: Do We Capture Morphology? , 2017, ACL.

[22]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.