论文信息 - Discriminant Ranking for Efficient Treebanking

Discriminant Ranking for Efficient Treebanking

Treebank annotation is a labor-intensive and time-consuming task. In this paper, we show that a simple statistical ranking model can significantly improve treebanking efficiency by prompting human annotators, well-trained in disambiguation tasks for treebanking but not necessarily grammar experts, to the most relevant linguistic disambiguation decisions. Experiments were carried out to evaluate the impact of such techniques on annotation efficiency and quality. The detailed analysis of outputs from the ranking model shows strong correlation to the human annotator behavior. When integrated into the tree-banking environment, the model brings a significant annotation speed-up with improved inter-annotator agreement.

Yi Zhang | Valia Kordoni

[1] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2] Thorsten Brants,et al. The LinGO Redwoods Treebank: Motivation and Preliminary Applications , 2002, COLING.

[3] Mary Dalrymple,et al. The PARC 700 Dependency Bank , 2003, LINC@EACL.

[4] Yi Zhang,et al. Annotating Wall Street Journal Texts Using a Hand-Crafted Deep Linguistic Grammar , 2009, Linguistic Annotation Workshop.

[5] David M. Carter,et al. The TreeBanker: a Tool for Supervised Training of Parsed Corpora , 1997, ArXiv.

[6] Ulrich Callmeier,et al. Efficient Parsing with Large-Scale Unification Grammars , 2001 .

[7] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[8] Dan Flickinger,et al. Minimal Recursion Semantics: An Introduction , 2005 .

[9] FlickingerDan. On building a more efficient grammar by exploiting types , 2000 .

[10] Ivan A. Sag,et al. Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[11] Sabine Brants,et al. The TIGER Treebank , 2001 .