Aligned-Layer Text Search in Clinical Notes

Search techniques in clinical text need to make fine-grained semantic distinctions, since medical terms may be negated, about someone other than the patient, or at some time other than the present. While natural language processing (NLP) approaches address these fine-grained distinctions, a task like patient cohort identification from electronic health records (EHRs) simultaneously requires a much more coarse-grained combination of evidence from the text and structured data of each patient’s health records. We thus introduce aligned-layer language models, a novel approach to information retrieval (IR) that incorporates the output of other NLP systems. We show that this framework is able to represent standard IR queries, formulate previously impossible multi-layered queries, and customize the desired degree of linguistic granularity.

[1]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[2]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[3]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[4]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[5]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[6]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[7]  Donald Metzler A Feature-Centric View of Information Retrieval , 2011, The Information Retrieval Series.

[8]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[9]  W. Bruce Croft,et al.  Modeling higher-order term dependencies in information retrieval using query hypergraphs , 2012, SIGIR '12.

[10]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[11]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[12]  Andrew McCallum,et al.  Transition-based Dependency Parsing with Selectional Branching , 2013, ACL.

[13]  Hongfang Liu,et al.  Using large clinical corpora for query expansion in text-based cohort identification , 2014, J. Biomed. Informatics.

[14]  Sooyoung Yoo,et al.  Semantic concept-enriched dependence model for medical information retrieval , 2014, J. Biomed. Informatics.

[15]  Alistair Moffat,et al.  Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval , 2005, SIGIR 2005.

[16]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[17]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[18]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[19]  W. Bruce Croft,et al.  Parameterized concept weighting in verbose queries , 2011, SIGIR.

[20]  Hongfang Liu,et al.  Using Discharge Summaries to Improve Information Retrieval in Clinical Domain , 2013, CLEF.

[21]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval , 2008, NAACL.

[22]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[23]  Ben Carterette,et al.  Joint search in text and concept spaces for EMR-based cohort identification , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[24]  W. Bruce Croft,et al.  Effective query formulation with multiple information sources , 2012, WSDM '12.

[25]  Stephen T. Wu,et al.  Clinical Information Retrieval with Split-layer Language Models , 2013 .