Improving health records search using multiple query expansion collections

The increasing prevalence of electronic health records (EHR), along with the needs for enhanced clinical care, presents new challenges to information retrieval (IR). Many clinical decision-making tasks following the philosophy of Evidence-Based Medicine (EBM) rely on the ability to find relevant health records and gather sufficient clinical evidence under severe time constraints. In this work, we present a system built upon statistical IR methods for searching flat-text health records (i.e. the doctors' notes sections of EHR) for patients with particular conditions specified via a keyword query. In particular, we use multiple external repositories for query expansion, and introduce two novel model weighting methods. Cross-validation results show that our system improves a strong baseline by 30% on mean average precision (MAP), and has a promising overall performance when compared with a manual system doing the same task.

[1]  William R. Hersh,et al.  Information Retrieval: A Health and Biomedical Perspective , 2002 .

[2]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[3]  William R. Hersh Health and Biomedical Information , 2009 .

[4]  Jason William Clark Information retrieval: a health and biomedical perspective. 3rd ed. , 2014 .

[5]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[6]  Hao Wu,et al.  An Exploration of New Ranking Strategies for Medical Record Tracks , 2011, TREC.

[7]  Sanda M. Harabagiu,et al.  Cohort Shepherd: Discoving Cohort Traits from Hospital Visits , 2011, TREC.

[8]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[9]  Antonio Jimeno-Yepes,et al.  A Knowledge-Based Approach to Medical Records Retrieval , 2011, TREC.

[10]  David A. Hanauer,et al.  EMERSE: The Electronic Medical Record Search Engine , 2006, AMIA.

[11]  Elmer V. Bernstam,et al.  A day in the life of PubMed: analysis of a typical day's query log. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[12]  David A. Hanauer,et al.  Enhanced identification of eligibility for depression research using an electronic medical record search engine , 2009, Int. J. Medical Informatics.

[13]  Xiangji Huang,et al.  York University at TREC 2011: Medical Records Track , 2011, TREC.

[14]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[15]  Dolf Trieschnigg,et al.  DutchHatTrick: Semantic Query Modeling, ConText, Section Detection, and Match Score Maximization , 2011, TREC.

[16]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[17]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[18]  Lei Yang,et al.  Query log analysis of an electronic health record search engine. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[19]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[20]  Henning Müller,et al.  Overview of the CLEF 2009 Medical Image Retrieval Track , 2009, CLEF.

[21]  Ben Carterette,et al.  Using Multiple External Collections for Query Expansion , 2011, TREC.

[22]  Lijun Wang,et al.  Cengage Learning at TREC 2011 Medical Track , 2011, TREC.