Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation

Methods for evaluating dependency parsing using attachment scores are highly sensitive to representational variation between dependency treebanks, making cross-experimental evaluation opaque. This paper develops a robust procedure for cross-experimental evaluation, based on deterministic unification-based operations for harmonizing different representations and a refined notion of tree edit distance for evaluating parse hypotheses relative to multiple gold standards. We demonstrate that, for different conversions of the Penn Treebank into dependencies, performance trends that are observed for parsing results in isolation change or dissolve completely when parse hypotheses are normalized and brought into the same common ground.

[1]  Martin Emms Tree Distance and Some Other Variants of Evalb , 2008, LREC.

[2]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[3]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[4]  Joakim Nivre,et al.  Evaluation of Dependency Parsers on Unbounded Dependencies , 2010, COLING.

[5]  Josef van Genabith,et al.  Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited , 2007 .

[6]  Sandra Kübler,et al.  Recent Developments in Linguistic Annotations of the TüBa-D / Z Treebank , 1999 .

[7]  Martha Palmer,et al.  Robust Constituent-to-Dependency Conversion for English , 2010 .

[8]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[9]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[10]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[11]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[12]  Roy Schwartz,et al.  Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation , 2011, ACL.

[13]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[14]  Ted Briscoe,et al.  Relational evaluation schemes , 2002 .

[15]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[16]  Heike Telljohann,et al.  Towards a Dependency-Oriented Evaluation for Partial Parsing , 2002 .

[17]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[18]  Arnold M. Zwicky,et al.  Heads in grammatical theory: Heads, bases and functors , 1993 .

[19]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[20]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[21]  Franck Thollard,et al.  Proceedings of COLING , 2004 .

[22]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[23]  Joakim Nivre,et al.  Bootstrapping a Swedish Treebank Using Cross-Corpus Harmonization and Annotation Projection , 2007 .

[24]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[25]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[26]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[27]  Udo Hahn,et al.  Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks , 2010, EMNLP.

[28]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[29]  Owen Rambow The Simple Truth about Dependency and Phrase Structure Representations: An Opinion Piece , 2010, HLT-NAACL.

[30]  John A. Carroll,et al.  Beyond PARSEVAL — Towards Improved Evaluation Measures for Parsing Systems , .

[31]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[32]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[33]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[34]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[35]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[36]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[37]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.