论文信息 - A structural approach to the automatic adjudication of word sense disagreements

A structural approach to the automatic adjudication of word sense disagreements

Abstract The semantic annotation of texts with senses from a computational lexicon is a complex and often subjective task. As a matter of fact, the fine granularity of the WordNet sense inventory [Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database MIT Press], a de facto standard within the research community, is one of the main causes of a low inter-tagger agreement ranging between 70% and 80% and the disappointing performance of automated fine-grained disambiguation systems (around 65% state of the art in the Senseval-3 English all-words task). In order to improve the performance of both manual and automated sense taggers, either we change the sense inventory (e.g. adopting a new dictionary or clustering WordNet senses) or we aim at resolving the disagreements between annotators by dealing with the fineness of sense distinctions. The former approach is not viable in the short term, as wide-coverage resources are not publicly available and no large-scale reliable clustering of WordNet senses has been released to date. The latter approach requires the ability to distinguish between subtle or misleading sense distinctions. In this paper, we propose the use of structural semantic interconnections – a specific kind of lexical chains – for the adjudication of disagreed sense assignments to words in context. The approach relies on the exploitation of the lexicon structure as a support to smooth possible divergencies between sense annotators and foster coherent choices. We perform a twofold experimental evaluation of the approach applied to manual annotations from the SemCor corpus, and automatic annotations from the Senseval-3 English all-words competition. Both sets of experiments and results are entirely novel: structural adjudication allows to improve the state-of-the-art performance in all-words disambiguation by 3.3 points (achieving a 68.5% F1-score) and attains figures around 80% precision and 60% recall in the adjudication of disagreements from human annotators.

R. Navigli | Roberto Navigli

[1] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[2] Graeme Hirst,et al. Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[3] James Pustejovsky,et al. The Generative Lexicon , 1995, CL.

[4] Longman. Longman Language Activator , 1993 .

[5] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[6] William B. Dolan,et al. Word Sense Ambiguation: Clustering Related Senses , 1994, COLING.

[7] Graeme Hirst,et al. Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[8] Eneko Agirre,et al. Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 1997, ACL.

[9] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10] Adam Kilgarriff,et al. "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[11] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .