论文信息 - An Empirical Study of Differences between Conversion Schemes and Annotation Guidelines

An Empirical Study of Differences between Conversion Schemes and Annotation Guidelines

We establish quantitative methods for comparing and estimating the quality of dependency annotations or conversion schemes. We use generalized tree-edit distance to measure divergence between annotations and propose theoretical learnability, derivational perplexity and downstream performance for evaluation. We present systematic experiments with treeto-dependency conversions of the PennIII treebank, as well as observations from experiments using treebanks from multiple languages. Our most important observations are: (a) parser bias makes most parsers insensitive to non-local differences between annotations, but (b) choice of annotation nevertheless has significant impact on most downstream applications, and (c) while learnability does not correlate with downstream performance, learnable annotations will lead to more robust performance across domains.

Anders Søgaard

[1] Y. Singer,et al. Ultraconservative online algorithms for multiclass problems , 2003 .

[2] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[3] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[4] Evelina Andersson,et al. Cross-Framework Evaluation for Statistical Parsing , 2012, EACL.

[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6] Stephan Oepen,et al. Speculation and Negation: Rules, Rankers, and the Role of Syntax , 2012, CL.

[7] Roy Schwartz,et al. Learnability-Based Syntactic Annotation Design , 2012, COLING.

[8] Anders Søgaard,et al. On the Derivation Perplexity of Treebanks , 2010 .

[9] Jakob Elming,et al. Reordering by Parsing , 2011 .

[10] Stephan Oepen,et al. Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies , 2012, LAW@ACL.

[11] Yuji Matsumoto,et al. Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.