Parsing Morphologically Rich Languages with (Mostly) Off-The-Shelf Software and Word Vectors

As a contribution to the 2014 SPMRL shared task on parsing morphologically rich languages, we show that it is now possible to achieve high dependency accuracy using existing parsers without the need for intricate multi-parser schemes even if only small amounts of training data are available. We further show that the impact of using word vectors on parsing quality heavily depends on the amount of morphological information that is available. In addition, we discuss the use of parser scores for selection of morphological lattice paths, showing that there is much discriminative power in syntactic parsers for morphological disambiguation.

[1]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[2]  Arantza Díaz de Ilarraza,et al.  From Dependencies to Constituents in the Reference Corpus for the Processing of Basque (EPEC) , 2008, Proces. del Leng. Natural.

[3]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[4]  李幼升,et al.  Ph , 1989 .

[5]  Arantza Díaz de Ilarraza Sánchez,et al.  From Dependencies to Constituents in the Reference Corpus for the Processing of Basque (EPEC) , 2008 .

[6]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[7]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[8]  K. Sima'any,et al.  Building a Tree-Bank of Modern Hebrew Text , 2001 .

[9]  Key-Sun Choi,et al.  KAIST Tree Bank Project for Korean: Present and Future Development , 1994 .

[10]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[11]  János Csirik,et al.  The Szeged Treebank , 2005, TSD.

[12]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[13]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[14]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[15]  Marcin Woliński,et al.  A Preliminary Version of Składnica — a Treebank of Polish , 2011 .

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Jonas Kuhn,et al.  Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[18]  Díaz de Ilarraza Construction of a Basque Dependency Treebank , 2003 .

[19]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[20]  Wolfgang Seeker,et al.  (Re)ranking Meets Morphosyntax: State-of-the-art Results from the SPMRL 2013 Shared Task , 2013, SPMRL@EMNLP.

[21]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[22]  Reut Tsarfaty,et al.  RELATIONAL-REALIZATIONAL SYNTAX: AN ARCHITECTURE FOR SPECIFYING AND LEARNING MORPHOSYNTACTIC DESCRIPTIONS , 2010 .

[23]  Alina Polish Dependency Bank , 2012 .

[24]  Reut Tsarfaty,et al.  A Unified Morpho-Syntactic Scheme of Stanford Dependencies , 2013, ACL.

[25]  Jinho D. Choi Preparing Korean Data for the Shared Task on Parsing Morphologically Rich Languages , 2013, ArXiv.

[26]  Nizar Habash,et al.  Syntactic Annotation in the Columbia Arabic Treebank , 2009 .

[27]  Marcin Wolinski,et al.  Towards a Bank of Constituent Parse Trees for Polish , 2010, TSD.

[28]  Wolfgang Menzel,et al.  Because Size Does Matter: The Hamburg Dependency Treebank , 2014, LREC.

[29]  Nizar Habash,et al.  CATiB: The Columbia Arabic Treebank , 2009, ACL.

[30]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[31]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[32]  János Csirik,et al.  Hungarian Dependency Treebank , 2010, LREC.