论文信息 - Parser evaluation across Text Types

Parser evaluation across Text Types

When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. 1

Yannick Versley

[1] Frank Keller,et al. Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[2] Amit Dubey,et al. What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing , 2005, ACL.

[3] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[4] Sandra Kübler. How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges , 2005 .

[5] Daniel M. Bikel,et al. Intricacies of Collins’ Parsing Model , 2004, CL.

[6] Helmut Schmid. Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[7] Michael Schiehlen. Annotation Strategies for Probabilistic Parsing in German , 2004, COLING.

[8] Frank Henrik Müller. Annotating Grammatical Functions for German Using Finite-State Cascades , 2004, COLING.

[9] Wolfgang Menzel,et al. Automatic Transformation of Phrase Treebanks to Dependency Trees , 2004, LREC.

[10] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[11] Frank Keller,et al. Probabilistic Parsing for German Using Sister-Head Dependencies , 2003, ACL.

[12] Sabine Schulte im Walde,et al. Evaluation of the Gramotron Parser for German , 2002 .

[13] Daniel Gildea,et al. Corpus Variation and Parser Performance , 2001, EMNLP.

[14] Mitchell P. Marcus,et al. Maximum entropy models for natural language ambiguity resolution , 1998 .