A Multiple-Stage Approach to Re-ranking Medical Documents

The widespread use of the Web has radically changed the way people acquire medical information. Every day, patients, their caregivers, and doctors themselves search for medical information to resolve their medical information needs. However, search results provided by existing medical search engines often contain irrelevant or uninformative documents that are not appropriate for the purposes of the users. As a solution, this paper presents a method of re-ranking medical documents. The key concept of our method is to compute accurate similarity scores through multiple stages of re-ranking documents from the initial documents retrieved by a search engine. Specifically, our method combines query expansion with abbreviations, query expansion with discharge summary, clustering-based document scoring, centrality-based document scoring, and pseudo relevance feedback with relevance model. The experimental results from participating in Task 3a of the CLEF 2014 eHealth show the performance of our method.

[1]  KurlandOren,et al.  PageRank without hyperlinks , 2010 .

[2]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[3]  Jinwook Choi,et al.  Exploring Effective Information Retrieval Technique for the Medical Web Documents: SNUMedinfo at CLEFeHealth2014 Task 3 , 2014, CLEF.

[4]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[7]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[8]  Hao Yang,et al.  MedSearch: a specialized search engine for medical information retrieval , 2008, CIKM '08.

[9]  Peter Bruza,et al.  Discovering information flow suing high dimensional conceptual space , 2001, SIGIR '01.

[10]  Neill Graham Learning C , 1990 .

[11]  Sung-Hyon Myaeng,et al.  Utilizing global and path information with language modelling for hierarchical text classification , 2014, J. Inf. Sci..

[12]  Lijun Wang,et al.  Cengage Learning at TREC 2011 Medical Track , 2011, TREC.

[13]  Hongfang Liu,et al.  Using Discharge Summaries to Improve Information Retrieval in Clinical Domain , 2013, CLEF.

[14]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[15]  Xiaojie Liu,et al.  An Investigation of the Effectiveness of Concept-based Approach in Medical Information Retrieval , 2014, CLEF.

[16]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[17]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[18]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[19]  Gareth J. F. Jones,et al.  ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval , 2014, CLEF.

[20]  Ben Carterette,et al.  Exploring Evidence Aggregation Methods and External Expansion Sources for Medical Record Search , 2012, TREC.

[21]  Prasenjit Majumder,et al.  Team IRLabDAIICT at ShARe/CLEF eHealth 2014 Task 3: User-centered Information Retrieval System for Clinical Documents , 2014, CLEF.

[22]  Jorge Carrillo de Albornoz,et al.  UCM at TREC 2012: Does Negation Influence The Retrieval of Medical Reports? , 2012, TREC.