论文信息 - A Gold Standard Dependency Corpus for English

A Gold Standard Dependency Corpus for English

We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies formalism. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for English informal text genres. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating the quality of different versions of the Stanford Parser, which includes a converter tool to produce dependency annotation from constituency trees. We show that training a dependency parser on a mix of newswire and web data leads to better performance on that type of data without hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be a valuable resource for parsing. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser’s dependency converter. In response to the challenges encountered by annotators in the EWT corpus, the formalism has been revised and extended, and the converter has been improved.

[1] References , 1971 .

[2] Dan Klein,et al. Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[3] Joakim Nivre,et al. An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[4] Jan Hajic,et al. The Prague Dependency Treebank , 2003 .

[5] Christopher D. Manning,et al. Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[6] Roger Levy,et al. Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[7] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.

[8] Tapio Salakoski,et al. On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA , 2007, BioNLP@ACL.

[9] Mark Steedman,et al. Unbounded Dependency Recovery for Parser Evaluation , 2009, EMNLP.

[10] Christopher D. Manning,et al. Stanford typed dependencies manual , 2010 .

[11] Slav Petrov,et al. Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[12] Samuel R. Bowman,et al. More Constructions, More Genres: Extending Stanford Dependencies , 2013, DepLing.

[13] Joakim Nivre,et al. Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.