Document Ranking using an Enriched Thesaurus

A thesaurus may be viewed as a graph, and document retrieval algorithms can exploit this graph when both the documents and the query are represented by thesaurus terms. These retrieval algorithms measure the distance between the query and documents by using the path lengths in the graph. Previous work with such strategies has shown that the hierarchical relations in the thesaurus are useful but the non‐hierarchical relations are not. This paper shows that when the query explicitly mentions a particular non‐hierarchical relation, the retrieval algorithm benefits from the presence of such relations in the thesaurus. Our algorithms were applied to the Excerpta Medica bibliographic citation database whose citations are indexed with terms from the EMTREE thesaurus. We also created an enriched EMTREE by systematically adding non‐hierarchical relations from a medical knowledge base. Our algorithms used at one time EMTREE and, at another time, the enriched EMTREE in the course of ranking documents from Excerpta Medica against queries. When, and only when, the query specifically mentioned a particular non‐hierarchical relation type, did EMTREE enriched with that relation type lead to a ranking that better corresponded to an expert's ranking.

[1]  Roy Rada,et al.  Merging Thesauri: Principles and Evaluation , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Gerard Salton Historical Note: The Past Thirty Years in Information Retrieval , 1987 .

[3]  Roy Rada,et al.  Ranking documents with a thesaurus , 1989, JASIS.

[4]  Gerard Salton,et al.  The past thirty years in information retrieval , 1987, J. Am. Soc. Inf. Sci..

[5]  A. D. Groot The range of automatic spreading activation in word priming , 1983 .

[6]  Roy Rada,et al.  Gradualness Facilitates Knowledge Refinement , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Craig Stanfill,et al.  Parallel free-text search on the connection machine system , 1986, CACM.

[8]  W. Bruce Croft,et al.  I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[9]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[10]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[11]  Nicholas J. Belkin,et al.  Distributed Expert-Based Information Systems: An Interdisciplinary Approach , 1987, Inf. Process. Manag..

[12]  Roy Rada,et al.  Creating and Evaluating Entry Terms , 1988, J. Documentation.

[13]  Amos Tversky,et al.  Studies of similarity , 1978 .

[14]  Marvin Minsky,et al.  Semantic Information Processing , 1968 .

[15]  Roy Rada,et al.  A Graphical Thesaurus-Based Information Retrieval System , 1989, Int. J. Man Mach. Stud..

[16]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[17]  Edward A. Fox,et al.  Development of the coder system: A testbed for artificial intelligence methods in information retrieval , 1987, Inf. Process. Manag..

[18]  Andrew E. Wessel The implementation of complex information systems , 1979 .

[19]  Michael Lesk,et al.  Word-word associations in document retrieval systems , 1969 .

[20]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[21]  Jin H. Kim,et al.  A Model of Knowledge Based Information Retrieval with Hierarchical Concept Graph , 1990, J. Documentation.