论文信息 - OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation - 字舞流文

OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation

Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.

Chung-Hsien Wu | Liang-Chih Yu | Eduard H. Hovy | E. Hovy | Chung-Hsien Wu | Liang-Chih Yu

[1] Hwee Tou Ng,et al. An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[2] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3] Yee Whye Teh,et al. Improving Word Sense Disambiguation Using Topic Features , 2007, EMNLP.

[4] Rie Kubota Ando,et al. Applying Alternating Structure Optimization to Word Sense Disambiguation , 2006, CoNLL.

[5] Adam Kilgarriff,et al. English Lexical Sample Task Description , 2001, *SEMEVAL.

[6] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.

[7] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[8] Christian Posse,et al. PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation , 2007, SemEval@ACL.

[9] Hwee Tou Ng,et al. Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[10] Eneko Agirre,et al. UBC-ALM: Combining k-NN with SVD for WSD , 2007, SemEval@ACL.

[11] Mitchell P. Marcus,et al. OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[12] Olga Babko-Malaya,et al. Different Sense Granularities for Different Applications , 2004, HLT-NAACL 2004.

[13] Chung-Hsien Wu,et al. OntoNotes: Sense Pool Verification Using Google N-gram and Statistical Tests , 2007 .

[14] Jingbo Zhu,et al. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[15] Daniel Jurafsky,et al. Learning to Merge Word Senses , 2007, EMNLP.

[16] Lucia Specia,et al. Learning Expressive Models for Word Sense Disambiguation , 2007, ACL.

[17] Martha Palmer,et al. SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[18] M. A. R T H A P A L,et al. Making fine-grained and coarse-grained sense distinctions , both manually and automatically , 2005 .

[19] Rada Mihalcea,et al. Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[20] Rada Mihalcea,et al. Building a Sense Tagged Corpus with Open Mind Word Expert , 2002, SENSEVAL.

[21] Adam Kilgarriff,et al. Special issue on SENSEVAL: Evaluating word sense disambiguation programs , 2000 .

[22] I. D. Melamed. Measuring Semantic Entropy , 1997 .