Disambiguating Highly Ambiguous Words

A word sense disambiguator that is able to distinguish among the many senses of common words that are found in general-purpose, broad-coverage lexicons would be useful. For example, experiments have shown that, given accurate sense disambiguation, the lexical relations encoded in lexicons such as WordNet can be exploited to improve the effectiveness of information retrieval systems. This paper describes a classifier whose accuracy may be sufficient for such a purpose. The classifier combines the output of a neural network that learns topical context with the output of a network that learns local context to distinguish among the senses of highly ambiguous words.The accuracy of the classifier is tested on three words, the noun line, the verb serve, and the adjective hard; the classifier has an average accuracy of 87%, 90%, and 81%, respectively, when forced to choose a sense for all test cases. When the classifier is not forced to choose a sense and is trained on a subset of the available senses, it rejects test cases containing unknown senses as well as test cases it would misclassify if forced to select a sense. Finally, when there are few labeled training examples available, we describe an extension of our training method that uses information extracted from unlabeled examples to improve classification accuracy.

[1]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[2]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[3]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[4]  Yaacov Choueka,et al.  Disambiguation by short contexts , 1985, Comput. Humanit..

[5]  Ellen M. Voorhees,et al.  Towards Building Contextual Representations of Word Senses Using Statistical Models , 1996 .

[6]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[7]  Jan O. Pedersen Information Retrieval Based on Word Senses , 1995 .

[8]  Ezra Black,et al.  An Experiment in Computational Discrimination of English Word Senses , 1988, IBM J. Res. Dev..

[9]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[10]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[11]  Geoffrey G. Towell,et al.  Using Unlabeled Data for Supervised Learning , 1995, NIPS.

[12]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[13]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[14]  Nancy Ide,et al.  Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries , 1990, COLING.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[17]  Donna K. Harman The First Text REtrieval Conference (TREC-1), Rockville, MD, USA, 4-6 November 1992 , 1993, Inf. Process. Manag..

[18]  Ellen M. Voorhees,et al.  Vector Expansion in a Large Collection , 1992, TREC.

[19]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[20]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[21]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[22]  Martha W. Evens,et al.  Relational thesauri in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[23]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[24]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[25]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[26]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[27]  Robert J. Marks,et al.  Performance Comparisons Between Backpropagation Networks and Classification Trees on Three Real-World Applications , 1989, NIPS.

[28]  Marti A. Hearst Noun Homograph Disambiguation Using Local Context in Large Text Corpora , 1991 .

[29]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[30]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[31]  Paul W. Munro,et al.  Improving Committee Diagnosis with Resampling Techniques , 1995, NIPS.

[32]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[33]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.