Using Corpus Statistics and WordNet Relations for Sense Identification

Corpus-based approaches to word sense identification have flexibility and generality but suffer from a knowledge acquisition bottleneck. We show how knowledge-based techniques can be used to open the bottleneck by automatically locating training corpora. We describe a statistical classifier that combines topical context with local cues to identify a word sense. The classifier is used to disambiguate a noun, a verb, and an adjective. A knowledge base in the form of WordNet's lexical relations is used to automatically locate training examples in a general text corpus. Test results are compared with those from manually tagged training examples.

[1]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[2]  Stephen F. Weiss Learning to disambiguate , 1973, Inf. Storage Retr..

[3]  Nancy Ide,et al.  Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries , 1990, COLING.

[4]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[5]  Marti A. Hearst Noun Homograph Disambiguation Using Local Context in Large Text Corpora , 1991 .

[6]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[7]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[8]  Philip Resnik,et al.  WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery , 1992, AAAI 1992.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Graeme Hirst,et al.  Semantic Interpretation and the Resolution of Ambiguity , 1987, Studies in natural language processing.

[11]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[12]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[13]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[14]  Ellen M. Voorhees,et al.  Towards Building Contextual Representations of Word Senses Using Statistical Models , 1996 .

[15]  Janyce Wiebe,et al.  A New Approach to Word Sense Disambiguation , 1994, HLT.

[16]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[17]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[18]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[19]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[20]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[21]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[22]  Keh-Yih Su,et al.  Robust Learning, Smoothing, and Parameter Tying on Syntactic Ambiguity Resolution , 1995, Comput. Linguistics.

[23]  Hinrich Schütze,et al.  Ambiguity in language learning: computational and cognitive models , 1996 .

[24]  Christiane Fellbaum,et al.  Building Semantic Concordances , 1998 .

[25]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .