Optimal Document-Indexing Vocabulary for MEDLINE

The indexing vocabulary is an important determinant of success in text retrieval. Researchers have compared the effectiveness of indexing using free-text and controlled vocabularies in a variety of text contexts. A number of studies have investigated the relative merits of free-text, MeSH and UMLS Metathesaurus indexing vocabularies for MEDLINE document indexing. Most of these studies conclude that controlled vocabularies offer no advantages in retrieval performance over free-text. This paper offers a detailed analysis of prior results and their underlying experimental designs. The analysis indicates that there are a number of open questions relevant to the overall debate on indexing vocabularies for MEDLINE. This paper also offers results from a new experiment assessing eight different retrieval strategies. These strategies involve document indexing via free-text, MeSH and several alternative combinations of the two vocabularies. The results indicate that MeSH does have an important role in text retrieval.

[1]  W R Hersh,et al.  Words, concepts, or both: optimal indexing units for automated information retrieval. , 1992, Proceedings. Symposium on Computer Applications in Medical Care.

[2]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[3]  Susanne M. Humphrey,et al.  Knowledge-based indexing of the medical literature: the indexing aid project , 1987 .

[4]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[5]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[6]  T C Rindflesch,et al.  Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[7]  Alan R. Aronson,et al.  Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[8]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[9]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[10]  William R. Hersh,et al.  Research Paper: A Performance and Failure Analysis of SAPHIRE with a MEDLINE Test Collection , 1994, J. Am. Medical Informatics Assoc..

[11]  Carolyn J. Crouch,et al.  An approach to the automatic construction of global thesauri , 1990, Inf. Process. Manag..

[12]  Yiming Yang,et al.  An application of least squares fit mapping to text information retrieval , 1993, SIGIR.

[13]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[14]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[15]  William R. Hersh,et al.  Information Retrieval in Medicine: The SAPHIRE Experience , 1995 .

[16]  C G Chute,et al.  Words or concepts: the features of indexing units and their optimal use in information retrieval. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[17]  D Slingluff,et al.  An end user search service in an academic health sciences library. , 1985, Medical reference services quarterly.

[18]  Padmini Srinivasan,et al.  Research Paper: Retrieval Feedback in MEDLINE , 1996, J. Am. Medical Informatics Assoc..

[19]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[20]  W R Hersh,et al.  A comparison of retrieval effectiveness for three methods of indexing medical literature. , 1992, The American journal of the medical sciences.

[21]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..