论文信息 - Why is German Dependency Parsing More Reliable than Constituent Parsing

Why is German Dependency Parsing More Reliable than Constituent Parsing

In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used [ , ]. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g. [12, 13]). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank [11] has been converted to dependencies. For this version, Nivre et al. [14] report an accuracy rate of 86.3%, as compared to an F-score of 2.1 for Charniak’s parser [1]. The Penn Chinese Treebank [1 ] is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version [2] and . % accuracy for the dependency version [14]. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 5.3, depending on the treebank, NEGRA [1 ] or TuBa-D/Z [1 ]. The dependency parser based on a converted version of Tuba-D/Z, in contrast, reached an accuracy of 3.4% [14], i.e. 12 percent points better than the best constituent analysis including grammatical functions.

Sandra Kübler | Jelena Prokić

[1] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2] Erich Drach,et al. Grundgedanken der deutschen Satzlehre , 1963 .

[3] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[4] Sandra Kübler. How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges , 2005 .

[5] Roger Levy,et al. Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[6] Erhard W. Hinrichs,et al. Is it Really that Difficult to Parse German? , 2006, EMNLP.

[7] M. A. R T A P A L,et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[8] Joakim Nivre,et al. Memory-Based Dependency Parsing , 2004, CoNLL.

[9] Helmut Schmid,et al. LoPar: Design and Implementation , 2000 .

[10] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[11] Wojciech Skut,et al. An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[12] Wolfgang Maier,et al. Annotation Schemes and their Influence on Parsing Results , 2006, ACL.

[13] Joakim Nivre,et al. MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[14] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[15] David Chiang,et al. Recovering Latent Information in Treebanks , 2002, COLING.

[16] Wolfgang Menzel,et al. A broad-coverage parser for German based on defeasible constraints , 2008 .