On Combining Text and MeSH Searches to Improve the Retrieval of MEDLINE documents

The MEDLINE database is the world largest repository of bio-medical abstracts. It is a central information entry point for most biologists despite the growing availability of full-text articles on the WWW. Each article is manually annotated by MeSH terms to allow easy access and in order to improve retrieval, the MeSH fields of MEDLINE records were successfully used in the past with pseudo-relevance feedback and MeSH query expansion. However, previous experiments often ignored the MeSH field structure information. This paper investigates the impact of the MEDLINE MeSH field structure on a method that combines text and MeSH searches on a large subset of the MEDLINE database. Robertson's Offer Weight technique is used to generate MeSH queries. Our method is evaluated within the TREC 2005 Genomics Track on the ad hoc task collection and our results show that this approach does significantly improve retrieval performance.

[1]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[2]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[3]  Joel D. Martin,et al.  Finiding Gene Function using LitMiner , 2003, TREC.

[4]  Alan F. Smeaton,et al.  Físréal: A Low Cost Terabyte Search Engine , 2005, ECIR.

[5]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[6]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[7]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[8]  Alexander F. Gelbukh,et al.  Advanced Relevance Feedback Query Expansion Strategy for Information Retrieval in MEDLINE , 2004, CIARP.

[9]  P. Srinivasan Retrieval feedback in MEDLINE. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[10]  Patrick Ruch,et al.  Evaluation of Stemming, Query Expansion and Manual Indexing Approaches for the Genomic Task , 2005, TREC.

[11]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[12]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[13]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[14]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[15]  Sumio Fujita Revisiting Again Document Length Hypotheses TREC 2004 Genomics Track Experiments at Patolis , 2004, TREC.