How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance?

The common use of a single de facto standard annotation scheme for dependency treebank creation leaves the question open to what extent the performance of an application trained on a treebank depends on this annotation scheme and whether a linguistically richer scheme would imply a decrease of the performance of the application. We investigate the effect of the variation of the number of grammatical relations in a tagset on the performance of dependency parsers. In order to obtain several levels of granularity of the annotation, we design a hierarchical annotation scheme exclusively based on syntactic criteria. The richest annotation contains 60 relations. The more coarse-grained annotations are derived from the richest. As a result, all annotations and thus also the performance of a parser trained on different annotations remain comparable. We carried out experiments with four state-of-the-art dependency parsers. The results support the claim that annotating with more fine-grained syntactic relations does not necessarily imply a significant loss of accuracy. We also show the limits of this approach by giving details on the fine-grained relations that do have a negative impact on the performance of the parsers.

[1]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[2]  Ivan Titov,et al.  A Latent Variable Model of Synchronous Syntactic-Semantic Parsing for Multiple Languages , 2009, CoNLL Shared Task.

[3]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[4]  Sandra Kübler How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges , 2005 .

[5]  Leo Wanner,et al.  Towards a rich dependency annotation of Spanish corpora , 2009 .

[6]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[7]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[8]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[9]  Joakim Nivre,et al.  Comparing the Influence of Different Treebank Annotations on Dependency Parsing , 2010, LREC.

[10]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[11]  Bernd Bohnet Efficient Parsing of Syntactic and Semantic Dependency Structures , 2009, CoNLL Shared Task.

[12]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[13]  Josef van Genabith,et al.  Treebank Annotation Schemes and Parser Evaluation for German , 2007, EMNLP.

[14]  Leo Wanner,et al.  Looking Behind the Scenes of Syntactic Dependency Corpus Annotation: Towards a Motivated Annotation Schema of Surface-Syntax in Spanish , 2011, DepLing.

[15]  Yongqiang Li,et al.  Multilingual Dependency-based Syntactic and Semantic Parsing , 2009, CoNLL Shared Task.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Ted Briscoe,et al.  Relational evaluation schemes , 2002 .

[18]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[19]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[20]  Cristina Bosco,et al.  Annotation Schema Oriented Validation for Dependency Parsing Evaluation , 2010 .