Combining Global and Local Semantic Contexts for Improving Biomedical Information Retrieval

In the context of biomedical information retrieval (IR), this paper explores the relationship between the document's global context and the query's local context in an attempt to overcome the term mismatch problem between the user query and documents in the collection. Most solutions to this problem have been focused on expanding the query by discovering its context, either global or local. In a global strategy, all documents in the collection are used to examine word occurrences and relationships in the corpus as a whole, and use this information to expand the original query. In a local strategy, the top-ranked documents retrieved for a given query are examined to determine terms for query expansion. We propose to combine the document's global context and the query's local context in an attempt to increase the term overlap between the user query and documents in the collection via document expansion (DE) and query expansion (QE). The DE technique is based on a statistical method (IR-based) to extract the most appropriate concepts (global context) from each document. The QE technique is based on a blind feedback approach using the top-ranked documents (local context) obtained in the first retrieval stage. A comparative experiment on the TREC 2004 Genomics collection demonstrates that the combination of the document's global context and the query's local context shows a significant improvement over the baseline. The MAP is significantly raised from 0.4097 to 0.4532 with a significant improvement rate of +10.62% over the baseline. The IR performance of the combined method in terms of MAP is also superior to official runs participated in TREC 2004 Genomics and is comparable to the performance of the best run (0.4075).

[1]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[2]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[3]  Carol Peters,et al.  Evaluating Systems for Multilingual and Multimodal Information Access, 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers , 2009, CLEF.

[4]  Clement T. Yu,et al.  Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature , 2007, SIGIR.

[5]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[6]  Justin Zobel,et al.  Document expansion versus query expansion for ad-hoc retrieval , 2005 .

[7]  Iadh Ounis,et al.  Research directions in Terrier: a search engine for advanced retrieval on the Web , 2007 .

[8]  Yi Li,et al.  Exploring criteria for successful query expansion in the genomic domain , 2009, Information Retrieval.

[9]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[10]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[11]  Jacques Savoy,et al.  Searching in Medline: Query expansion and manual indexing evaluation , 2008, Inf. Process. Manag..

[12]  Jean-Pierre Chevallet,et al.  Thesaurus-based query and document expansion in conceptual indexing with UMLS: Application in medical information retrieval , 2007, 2007 IEEE International Conference on Research, Innovation and Vision for the Future.

[13]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[14]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[15]  William R. Hersh,et al.  Report on the TREC 2004 genomics track , 2005, SIGF.

[16]  Patrick Ruch,et al.  Query and Document Expansion with Medical Subject Headings Terms at Medical Imageclef 2008 , 2008, CLEF.

[17]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[18]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[19]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.