Ontology Graph Based Query Expansion for Biomedical Information Retrieval

Query expansion based biomedical information retrieval has been studied for over two decades, most of the studies focus only on taking advantage of one vocabulary: MeSH. We propose a completely different approach utilizing an arbitrary number of controlled vocabularies from Metathesaurus. Experiment shows that our ontology based query expansion scheme achieves 8.2% and 17.7% improvement compared with schemes using pseudo relevance feedback query expansion and using no query expansion respectively. The average improvement is 24.8% in comparison to all other existing strategies. Furthermore, we identify that generalized biomedical concepts are the reason for performance degradation.

[1]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[2]  Sooyoung Yoo,et al.  Improving MEDLINE Document Retrieval Using Automatic Query Expansion , 2007, ICADL.

[3]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[4]  Eneko Agirre,et al.  Graph-based Word Sense Disambiguation of biomedical documents , 2010, Bioinform..

[5]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[6]  William R. Hersh,et al.  Information Retrieval in Medicine: The SAPHIRE Experience , 1995, J. Am. Soc. Inf. Sci..

[7]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[8]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[9]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[10]  José Luís Oliveira,et al.  Concept-based query expansion for retrieving gene related publications from MEDLINE , 2010, BMC Bioinformatics.

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Patrick Ruch,et al.  Evaluation of Stemming, Query Expansion and Manual Indexing Approaches for the Genomic Task , 2005, TREC.

[14]  Christopher D. Manning,et al.  Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.

[15]  Padmini Srinivasan,et al.  Optimal Document-Indexing Vocabulary for MEDLINE , 1996, Inf. Process. Manag..

[16]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[17]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.