论文信息 - Evaluating a Statistical CCG Parser on Wikipedia

Evaluating a Statistical CCG Parser on Wikipedia

The vast majority of parser evaluation is conducted on the 1984 Wall Street Journal (WSJ). In-domain evaluation of this kind is important for system development, but gives little indication about how the parser will perform on many practical problems. Wikipedia is an interesting domain for parsing that has so far been under-explored. We present statistical parsing results that for the first time provide information about what sort of performance a user parsing Wikipedia text can expect. We find that the C&C parser's standard model is 4.3% less accurate on Wikipedia text, but that a simple self-training exercise reduces the gap to 3.8%. The self-training also speeds up the parser on newswire text by 20%.

Joel Nothman | James R. Curran | Matthew Honnibal

[1] Jason Eisner. Efficient Normal-Form Parsing for Combinatory Categorial Grammar , 1996, ACL.

[2] Daniel Gildea,et al. Corpus Variation and Parser Performance , 2001, EMNLP.

[3] Tibor Kiss,et al. Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[4] John Blitzer,et al. Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[5] Mark Steedman,et al. CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[6] James R. Curran,et al. Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[7] Martin Kay,et al. Syntactic Process , 1979, ACL.

[8] Stephan Oepen,et al. Extracting and Annotating Wikipedia Sub-Domains — Towards a New eScience Community Resource , 2008 .

[9] Stephen Clark,et al. Porting a lexicalized-grammar parser to the biomedical domain , 2009, J. Biomed. Informatics.

[10] Sebastian Riedel,et al. The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[11] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[12] Mark Steedman,et al. Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[13] Srinivas Bangalore,et al. Supertagging: An Approach to Almost Parsing , 1999, CL.