Lexical ambiguity and information retrieval

Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval test collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance. For example, resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Other uses of word sense disambiguation in an information retrieval context are discussed.

[1]  Lois L. Earl Use of word government in resolving syntactic and semantic ambiguities , 1973, Inf. Storage Retr..

[2]  Martin Chodorow,et al.  A Tool For Investigating Tile Synonymy Relation In A Sense Disambiguated Thesaurus , 1988, ANLP.

[3]  Garrison W. Cottrell,et al.  Lexical ambiguity resolution , 1987 .

[4]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[5]  Chuck Rieger,et al.  Parsing and comprehending with word experts (a theory and its realization) , 1982 .

[6]  Philip Hayes Some association-based techniques for lexical disambiguation by machine , 1977 .

[7]  Kathleen Dahlgren,et al.  Naive semantics for natural language understanding , 1988 .

[8]  Michael J. Pazzani,et al.  Word-Meaning Selection in Multiprocess Language Understanding Programs , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  G. Zipf The meaning-frequency relationship of words. , 1945, The Journal of general psychology.

[10]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[11]  Garrison W. Cottrell,et al.  Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Artificial Intelligence , 1988 .

[12]  Stephen F. Weiss Learning to disambiguate , 1973, Inf. Storage Retr..

[13]  Yorick Wilks,et al.  A tractable machine dictionary as a resource for computational semantics , 1989 .

[14]  Nicholas J. Belkin,et al.  Retrieval techniques , 1987 .

[15]  W. Bruce Croft,et al.  Word sense disambiguation using machine-readable dictionaries , 1989, SIGIR '89.

[16]  Anthony Long Uk MARC and US/MARC: a Brief History and Comparison , 1984, J. Documentation.

[17]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[18]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[19]  Ezra Black,et al.  An Experiment in Computational Discrimination of English Word Senses , 1988, IBM J. Res. Dev..

[20]  Edward F. Kelly,et al.  Computer recognition of English word senses , 1975 .

[21]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[22]  E. M. Keen,et al.  X- X. an Analysis of the Documentation Requests , 1967 .

[23]  Graeme Hirst,et al.  Resolving Lexical Ambiguity Computationally with Spreading Activation and Polaroid Words , 1988 .

[24]  Robert Alfred Amsler The Structure of the Merriam-Webster Pocket Dictionary , 1980 .

[25]  Yorick Wilks,et al.  Lexical semantics and preference semantics analysis , 1988 .

[26]  E. M. Anthony An Exploratory Inquiry into Lexical Clusters , 1954 .

[27]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[28]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[29]  Yaacov Choueka,et al.  Disambiguation by short contexts , 1985, Comput. Humanit..

[30]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..