Supporting BioMedical Information Retrieval: The BioTracer Approach

The large amount and diversity of available biomedical information has put a high demand on existing search systems. Such a tool should be able to not only retrieve the sought information, but also filter out irrelevant documents, while giving the relevant ones the highest ranking. Focusing on biomedical information, this work investigates how to improve the ability for a system to find and rank relevant documents. To achieve this goal, we apply a series of information retrieval techniques to search in biomedical information and combine them in an optimal manner. These techniques include extending and using well-established information retrieval (IR) similarity models such as the Vector Space Model (VSM) and BM25 and their underlying scoring schemes. The techniques also allow users to affect the ranking according to their view of relevance. The techniques have been implemented and tested in a proof-of-concept prototype called BioTracer, which extends a Java-based open source search engine library. The results from our experiments using the TREC 2004 Genomic Track collection are promising. Our investigation have also revealed that involving the user in the search process will indeed have positive effects on the ranking of search results, and that the approaches used in BioTracer can be used to meet the user's information needs.

[1]  William R Hersh,et al.  Enhancing access to the Bibliome: the TREC 2004 Genomics Track , 2006, Journal of biomedical discovery and collaboration.

[2]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[3]  Emine Yilmaz,et al.  Estimating average precision when judgments are incomplete , 2007, Knowledge and Information Systems.

[4]  Ellen M. Voorhees On test collections for adaptive information retrieval , 2008, Inf. Process. Manag..

[5]  Peer Bork,et al.  The way we write , 2003, EMBO Reports.

[6]  Lenka Lhotska,et al.  Information Technology in Bio- and Medical Informatics, ITBAM 2010, First International Conference, Bilbao, Spain, September 1-2, 2010. Proceedings , 2010, ITBAM.

[7]  Xiaohui Xie,et al.  Interactive and fuzzy search: a dynamic way to explore MEDLINE , 2010, Bioinform..

[8]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[9]  Dagur Valberg Johannsson Biomedical Information Retrieval based on Document-Level Term Boosting , 2009 .

[10]  Mika Käki,et al.  Controlling the complexity in comparing search user interfaces via user studies , 2008, Information Processing & Management.

[11]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[12]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[13]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[14]  Terri K. Attwood,et al.  BioIE: extracting informative sentences from the biomedical literature , 2005, Bioinform..

[15]  Alfred D. Eaton,et al.  HubMed: a web-based biomedical literature search interface , 2006, Nucleic Acids Res..

[16]  ChengXiang Zhai,et al.  An empirical study of tokenization strategies for biomedical information retrieval , 2007, Information Retrieval.

[17]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[18]  Elmer V. Bernstam,et al.  A day in the life of PubMed: analysis of a typical day's query log. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[19]  Jacques Savoy,et al.  Searching in Medline: Query expansion and manual indexing evaluation , 2008, Inf. Process. Manag..

[20]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[21]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[22]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[23]  Ulf Leser,et al.  What makes a gene name? Named entity recognition in the biomedical literature , 2005, Briefings Bioinform..

[24]  Wei-Cheng Cheng,et al.  Tools for knowledge acquisition within the NeuroScholar system and their application to anatomical tract-tracing data , 2006, Journal of biomedical discovery and collaboration.

[25]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[26]  Heri Ramampiaro,et al.  BioMedical Information Retrieval: The BioTracer Approach , 2010, ITBAM.

[27]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[28]  Hongfang Liu,et al.  Gene name ambiguity of eukaryotic nomenclatures , 2005, Bioinform..

[29]  Dolf Trieschnigg,et al.  The influence of basic tokenization on biomedical document retrieval , 2007, SIGIR.

[30]  Diane Kelly,et al.  Questionnaire mode effects in interactive information retrieval experiments , 2008, Inf. Process. Manag..

[31]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[32]  Adrian J. Shepherd,et al.  A realistic assessment of methods for extracting gene/protein interactions from free text , 2009, BMC Bioinformatics.

[33]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[34]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .