Evaluation of query expansion using MeSH in PubMed

This paper investigates the effectiveness of using MeSH® in PubMed through its automatic query expansion process: Automatic Term Mapping (ATM). We run Boolean searches based on a collection of 55 topics and about 160,000 MEDLINE® citations used in the 2006 and 2007 TREC Genomics Tracks. For each topic, we first automatically construct a query by selecting keywords from the question. Next, each query is expanded by ATM, which assigns different search tags to terms in the query. Three search tags: [MeSH Terms], [Text Words], and [All Fields] are chosen to be studied after expansion because they all make use of the MeSH field of indexed MEDLINE citations. Furthermore, we characterize the two different mechanisms by which the MeSH field is used. Retrieval results using MeSH after expansion are compared to those solely based on the words in MEDLINE title and abstracts. The aggregate retrieval performance is assessed using both F-measure and mean rank precision. Experimental results suggest that query expansion using MeSH in PubMed can generally improve retrieval performance, but the improvement may not affect end PubMed users in realistic situations.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Elmer V. Bernstam,et al.  A day in the life of PubMed: analysis of a typical day's query log. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[3]  Mary Shultz,et al.  Variations in Medical Subject Headings (MeSH) mapping: from the natural language of patron terms to the controlled vocabulary of mapped lists. , 2002, Journal of the Medical Library Association : JMLA.

[4]  Aida Marissa Smith An examination of PubMed's ability to disambiguate subject queries and journal title queries. , 2004, Journal of the Medical Library Association : JMLA.

[5]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[6]  Robert J. Gaizauskas,et al.  Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms , 2004, TREC.

[7]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[8]  Massimo Melucci,et al.  Symbol-Based Query Expansion Experiments at TREC 2005 Genomics Track , 2005, TREC.

[9]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[10]  W. John Wilbur,et al.  The automatic identification of stop words , 1992, J. Inf. Sci..

[11]  Zhiyong Lu,et al.  Viewpoint Paper: Evaluating Relevance Ranking Strategies for MEDLINE Retrieval , 2009, J. Am. Medical Informatics Assoc..

[12]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[13]  W. John Wilbur,et al.  A Strategy for Assigning New Concepts in the MEDLINE Database , 2005, AMIA.

[14]  Beth G Carlin,et al.  PubMed automatic term mapping. , 2004, Journal of the Medical Library Association : JMLA.

[15]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[16]  William R. Hersh,et al.  Assessing thesaurus-based query expansion using the UMLS Metathesaurus , 2000, AMIA.

[17]  William R. Hersh,et al.  Phrases, Boosting, and Query Expansion Using External Knowledge Resources for Genomic Information Retrieval , 2003, TREC.

[18]  Renata C Geer,et al.  Entrez: making use of its power. , 2003, Briefings in bioinformatics.

[19]  Russell V. Lenth,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[20]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[21]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[22]  A R Aronson,et al.  The effect of textual variation on concept based information retrieval. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[23]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[24]  W. John Wilbur,et al.  Non-parametric significance tests of retrieval performance comparisons , 1994, J. Inf. Sci..

[25]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[26]  Patrick Ruch,et al.  Evaluation of Stemming, Query Expansion and Manual Indexing Approaches for the Genomic Task , 2005, TREC.

[27]  William R. Hersh,et al.  Information Retrieval: A Health and Biomedical Perspective , 2002 .

[28]  Mary Shultz,et al.  Mapping of medical acronyms and initialisms to Medical Subject Headings (MeSH) across selected systems. , 2006, Journal of the Medical Library Association : JMLA.

[29]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[30]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.