Parsing German: How Much Morphology Do We Need?

We investigate how the granularity of POS tags influences POS tagging, and furthermore, how POS tagging performance relates to parsing results. For this, we use the standard “pipeline” approach, in which a parser builds its output on previously tagged input. The experiments are performed on two German treebanks, using three POS tagsets of different granularity, and six different POS taggers, together with the Berkeley parser. Our findings show that less granularity of the POS tagset leads to better tagging results. However, both too coarse-grained and too fine-grained distinctions on POS level decrease parsing performance.

[1]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[2]  Wolfgang Menzel,et al.  Constraint Based Integration of Deep and Shallow Parsing Techniques , 2003, EACL.

[3]  Yannick Versley,et al.  Scalable Discriminative Parsing for German , 2009, IWPT.

[4]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[5]  Nizar Habash,et al.  Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features , 2013, CL.

[6]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[7]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[8]  李幼升,et al.  Ph , 1989 .

[9]  Kathrin Beck,et al.  Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) , 2012 .

[10]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[11]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[12]  Richárd Farkas,et al.  Special Techniques for Constituent Parsing of Morphologically Rich Languages , 2014, EACL.

[13]  Jun'ichi Tsujii,et al.  Incremental Joint POS Tagging and Dependency Parsing in Chinese , 2011, IJCNLP.

[14]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[15]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[16]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[17]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[18]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[19]  Jonas Kuhn,et al.  Morphological and Syntactic Case in Statistical Dependency Parsing , 2013, Computational Linguistics.

[20]  Dan Klein,et al.  Learning and Inference for Hierarchically Split PCFGs , 2007, AAAI.

[21]  Reut Tsarfaty,et al.  A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing , 2008, ACL.

[22]  Marie Candito,et al.  Parsing Word Clusters , 2010, SPMRL@NAACL-HLT.

[23]  Wolfgang Menzel,et al.  Parsing Unrestricted German Text with Defeasible Constraints , 2004, CSLP.

[24]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[25]  Sandra Kübler,et al.  Über den Einfluss von Part-of-Speech-Tags auf Parsing-Ergebnisse , 2013, J. Lang. Technol. Comput. Linguistics.

[26]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[27]  Josef van Genabith,et al.  Learning Morphology with Morfette , 2008, LREC.

[28]  Marie Candito,et al.  Cross parser evaluation : a French Treebanks study , 2009, IWPT.

[29]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[30]  Adriane Boyd,et al.  Discontinuity Revisited: An Improved Conversion to Context-Free Representations , 2007, LAW@ACL.

[31]  Dan Klein,et al.  Parsing German with Latent Variable Grammars , 2008 .

[32]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[33]  Max Welling Donald,et al.  Products of Experts , 2007 .

[34]  Yannick Versley Parser evaluation across Text Types , 2005 .

[35]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[36]  Josef van Genabith,et al.  Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic , 2011, SPMRL@IWPT.

[37]  Joseph Le Roux,et al.  Statistical Parsing of Spanish and Data Driven Lemmatization , 2012, SPMRL@ACL 2012.

[38]  Laura Kallmeyer,et al.  PLCFRS Parsing Revisited: Restricting the Fan-Out to Two , 2012, TAG.

[39]  Xiao Chen,et al.  Improving Part-of-speech Tagging for Context-free Parsing , 2011, IJCNLP.