(Re)ranking Meets Morphosyntax: State-of-the-art Results from the SPMRL 2013 Shared Task

This paper describes the IMS-SZEGED-CIS contribution to the SPMRL 2013 Shared Task. We participate in both the constituency and dependency tracks, and achieve state-of-theart for all languages. For both tracks we make significant improvements through high quality preprocessing and (re)ranking on top of strong baselines. Our system came out first for both tracks.

[1]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[2]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[3]  Nizar Habash,et al.  Syntactic Annotation in the Columbia Arabic Treebank , 2009 .

[4]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.

[5]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[6]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[7]  Rickard Domeij,et al.  Granska-an efficient hybrid system for Swedish grammar checking , 1999, NODALIDA.

[8]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[9]  Jirí Havelka Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures , 2007, ACL.

[10]  Helmut Schmid,et al.  A Programming Language for Finite State Transducers , 2005, FSMNLP.

[11]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[12]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[13]  David A. Smith,et al.  Log-Linear Models of Non-Projective Trees, $k$-best MST Parsing and Tree-Ranking , 2007, EMNLP-CoNLL.

[14]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[15]  Marcin Wolinski,et al.  Towards a Bank of Constituent Parse Trees for Polish , 2010, TSD.

[16]  János Csirik,et al.  Hungarian Dependency Treebank , 2010, LREC.

[17]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[18]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[19]  Evelina Andersson,et al.  Joint Evaluation of Morphological Segmentation and Syntactic Parsing , 2012, ACL.

[20]  Jonas Kuhn,et al.  Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[21]  Chengqing Zong,et al.  Parse Reranking Based on Higher-Order Lexical Dependencies , 2011, IJCNLP.

[22]  Khalil Sima'an,et al.  Relational-Realizational Parsing , 2008, COLING.

[23]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[24]  Jonas Kuhn,et al.  Morphological and Syntactic Case in Statistical Dependency Parsing , 2013, Computational Linguistics.

[25]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[26]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[27]  Mark Johnson,et al.  Reranking the Berkeley and Brown Parsers , 2010, HLT-NAACL.

[28]  Shay B. Cohen,et al.  Proceedings of ACL , 2013 .

[29]  Díaz de Ilarraza Construction of a Basque Dependency Treebank , 2003 .

[30]  Marcin Wolinski,et al.  Morfeusz - a Practical Tool for the Morphological Analysis of Polish , 2006, Intelligent Information Systems.

[31]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[32]  Key-Sun Choi,et al.  KAIST Tree Bank Project for Korean: Present and Future Development , 1994 .

[33]  Reut Tsarfaty,et al.  A Unified Morpho-Syntactic Scheme of Stanford Dependencies , 2013, ACL.

[34]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[35]  Haizhou Li,et al.  K-Best Combination of Syntactic Parsers , 2009, EMNLP.

[36]  Keith Hall,et al.  K-best Spanning Tree Parsing , 2007, ACL.

[37]  János Csirik,et al.  The Szeged Treebank , 2005, TSD.

[38]  Jinho D. Choi Preparing Korean Data for the Shared Task on Parsing Morphologically Rich Languages , 2013, ArXiv.

[39]  Sangwon ParkO,et al.  A Plug-In Component-based Korean Morphological Analyzer , 2011 .

[40]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[41]  Khalil Sima'an,et al.  Building a tree-bank of modern hebrew text , 2001 .

[42]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[43]  Nizar Habash,et al.  CATiB: The Columbia Arabic Treebank , 2009, ACL.

[44]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[45]  Eduard H. Hovy,et al.  A Fast, Accurate, Non-Projective, Semantically-Enriched Parser , 2011, EMNLP.

[46]  Slav Petrov,et al.  Products of Random Latent Variable Grammars , 2010, NAACL.