Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither

The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations.

[1]  Ken Hale,et al.  Warlpiri and the grammar of non-configurational languages , 1983 .

[2]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[3]  Marianne Mithun,et al.  Is basic word order universal , 1987 .

[4]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[7]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[8]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[9]  Wojciech Skut,et al.  A Linguistically Interpreted Corpus of German Newspaper Text , 1998, LREC.

[10]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[11]  J. Bresnan Lexical-Functional Syntax , 2000 .

[12]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[13]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[14]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[15]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[16]  Rens Bod An efficient implementation of a new DOP model , 2003, EACL.

[17]  Frank Keller,et al.  Probabilistic Parsing for German Using Sister-Head Dependencies , 2003, ACL.

[18]  Geoffrey Sampson,et al.  A test of the leaf-ancestor metric for parse accuracy , 2003, Natural Language Engineering.

[19]  Giorgio Satta,et al.  Analyzing an Italian Treebank with State-of-the-Art Statistical Parsers , 2004 .

[20]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[21]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[22]  Amit Dubey,et al.  What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing , 2005, ACL.

[23]  Erhard W. Hinrichs,et al.  A Unified Representation for Morphological, Syntactic, Semantic, and Referential Annotations , 2005, FCA@ACL.

[24]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[25]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[26]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[27]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[28]  Michael Collins,et al.  Morphology and Reranking for the Statistical Parsing of Spanish , 2005, HLT.

[29]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[30]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[31]  Seth Kulick,et al.  Parsing the Arabic Treebank: Analysis and Improvements , 2006 .

[32]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[33]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[34]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[35]  Khalil Sima'an,et al.  Three-Dimensional Parametrization for Parsing Morphologically Rich Languages , 2007, IWPT.

[36]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[37]  Josef van Genabith,et al.  Preparing, restructuring, and augmenting a French treebank:lexicalised parsers or coherent treebanks? , 2007 .

[38]  Josef van Genabith,et al.  Treebank Annotation Schemes and Parser Evaluation for German , 2007, EMNLP.

[39]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[40]  Marie Candito,et al.  Expériences d’analyse syntaxique statistique du français , 2008, JEPTALNRECITAL.

[41]  Reut Tsarfaty,et al.  A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing , 2008, ACL.

[42]  Seth Kulick,et al.  Enhanced Annotation and Parsing of the Arabic Treebank , 2008 .

[43]  Sandra Kübler The PaGe 2008 Shared Task on Parsing German , 2008 .

[44]  Khalil Sima'an,et al.  Relational-Realizational Parsing , 2008, COLING.

[45]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[46]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[47]  Marie Candito,et al.  Cross parser evaluation and tagset variation: a French treebank study , 2009 .

[48]  Yannick Versley,et al.  Scalable Discriminative Parsing for German , 2009, IWPT.

[49]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[50]  Reut Tsarfaty,et al.  Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon, Fuzzy Tag-Set Mapping, and EM-HMM-Based Lexical Probabilities , 2009, EACL.

[51]  Sambhav Jain,et al.  Two Methods to Incorporate ’Local Morphosyntactic’ Features in Hindi Dependency Parsing , 2010, SPMRL@NAACL-HLT.

[52]  Khalil Sima'an,et al.  Modeling Morphosyntactic Agreement in Constituency-Based Parsing of Modern Hebrew , 2010, SPMRL@NAACL-HLT.

[53]  Koldo Gojenola,et al.  Application of Different Techniques to Dependency Parsing of Basque , 2010, SPMRL@NAACL-HLT.

[54]  Joakim Nivre,et al.  On the Role of Morphosyntactic Features in Hindi Dependency Parsing , 2010, SPMRL@NAACL-HLT.

[55]  Wolfgang Maier,et al.  Direct Parsing of Discontinuous Constituents in German , 2010, SPMRL@NAACL-HLT.

[56]  Josef van Genabith,et al.  Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French , 2010, SPMRL@NAACL-HLT.

[57]  Josef van Genabith,et al.  Lemmatization and Lexicalized Statistical Parsing of Morphologically-Rich Languages: the Case of French , 2010, SPMRL@NAACL-HLT.

[58]  Matt Post,et al.  Factors Affecting the Accuracy of Korean Parsing , 2010, SPMRL@NAACL-HLT.

[59]  Marie Candito,et al.  Parsing Word Clusters , 2010, SPMRL@NAACL-HLT.

[60]  Nizar Habash,et al.  Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features , 2010, SPMRL@NAACL-HLT.

[61]  Yoav Goldberg,et al.  Easy-First Dependency Parsing of Modern Hebrew , 2010, SPMRL@NAACL-HLT.

[62]  Josef van Genabith,et al.  Lemmatization and Statistical Lexicalized Parsing of Morphologically-Rich Languages , 2010, HLT-NAACL 2010.