Revisiting the Impact of Different Annotation Schemes on PCFG Parsing: A Grammatical Dependency Evaluation

Recent parsing research has started addressing the questions a) how parsers trained on different syntactic resources differ in their performance and b) how to conduct a meaningful evaluation of the parsing results across such a range of syntactic representations. Two German treebanks, Negra and TuBa-D/Z, constitute an interesting testing ground for such research given that the two treebanks make very different representational choices for this language, which also is of general interest given that German is situated between the extremes of fixed and free word order. We show that previous work comparing PCFG parsing with these two treebanks employed PARSEVAL and grammatical function comparisons which were skewed by differences between the two corpus annotation schemes. Focusing on the grammatical dependency triples as an essential dimension of comparison, we show that the two very distinct corpora result in comparable parsing performance.

[1]  Josef van Genabith,et al.  Treebank Annotation Schemes and Parser Evaluation for German , 2007, EMNLP.

[2]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[3]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[4]  Thorsten Brants,et al.  Interactive Corpus Annotation , 2000, LREC.

[5]  Josef van Genabith,et al.  Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited , 2007 .

[6]  Marga Reis On Justifying Topological Frames : 'Positional Field' and the Order of Nonverbal Constituents in German ⁰ , 1980 .

[7]  Frank Keller,et al.  Probabilistic Parsing for German Using Sister-Head Dependencies , 2003, ACL.

[8]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[9]  Adriane Boyd,et al.  Discontinuity Revisited: An Improved Conversion to Context-Free Representations , 2007, LAW@ACL.

[10]  Wojciech Skut,et al.  SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS , 2003 .

[11]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[12]  James R. Curran,et al.  Formalism-Independent Parser Evaluation with CCG and DepBank , 2007, ACL.

[13]  Helmut Schmid,et al.  LoPar: Design and Implementation , 2000 .

[14]  Ted Briscoe,et al.  Parser evaluation: using a grammatical relation annotation scheme , 2003 .

[15]  Sandra Kübler How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges , 2005 .

[16]  Amit Dubey,et al.  Statistical parsing for German: modeling syntactic properties and annotation differences , 2005 .