Information Retrieval Based on Word Senses

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It diiers from standard approaches by allowing for as ne grained distinctions as is warranted by the information at hand, rather than supposing a xed number of senses per word, and by allowing for more than one sense to be assigned to a given word occurrence. The algorithm is applied to the standard vector-space information retrieval model and an evaluation is performed over the Category B TREC-1 corpus (WSJ subcollection). Results show that this sense disambiguation algorithm improves performance by between 7% and 14% on average .

[1]  Edward F. Kelly,et al.  Computer recognition of English word senses , 1975 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  W. Bruce Croft,et al.  Word sense disambiguation using machine-readable dictionaries , 1989, SIGIR '89.

[4]  Carolyn J. Crouch,et al.  An approach to the automatic construction of global thesauri , 1990, Inf. Process. Manag..

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Mary Hart,et al.  Automatic indexing using selective NLP and first-order thesauri , 1991, RIAO.

[9]  Marti A. Hearst Noun Homograph Disambiguation Using Local Context in Large Text Corpora , 1991 .

[10]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[11]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[12]  Stephen I. Gallant A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks , 1991, Neural Computation.

[13]  Yorick Wilks,et al.  Subject-Dependent Co-Occurence and Word Sense Disambiguation , 1991, ACL.

[14]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[15]  David Yarowsky,et al.  Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[16]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[17]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[18]  Graeme Hirst,et al.  Semantic Interpretation and the Resolution of Ambiguity , 1987, Studies in natural language processing.

[19]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[20]  Adam Kilgarriff,et al.  Dictionary word sense distinctions: An enquiry into their nature , 1992, Comput. Humanit..

[21]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[22]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[23]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[24]  D. Geeraerts Vagueness's puzzles, polysemy's vagaries , 1993 .

[25]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[26]  H. Morton The Story of Webster's Third: Philip Gove's Controversial Dictionary and its Critics , 1994 .

[27]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[28]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[29]  Hinrich Schütze,et al.  Information retrieval based on word senses , 1995 .

[30]  Hinrich Schütze,et al.  Customizing a Lexicon to Better Suit a Computational Task , 1996 .

[31]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..