Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Objective: Secondary use of electronic health record (EHR) data is enabled by accurate and complete retrieval of the relevant patient cohort, which requires searching both structured and unstructured data. Clinical text poses difficulties to searching, although chart notes incorporate structure that may facilitate accurate retrieval. Methods: We developed rules identifying clinical document sections, which can be indexed in search engines that allow faceted searches, such as Lucene or Essie, an NLM search engine. We developed 22 clinical cohorts and two queries for each cohort, one utilizing section headings and the other searching the whole document. We manually evaluated a subset of retrieved documents to compare query performance. Results: Querying by section had lower recall than whole-document queries (0.83 vs 0.95), higher precision (0.73 vs 0.54), and higher F1 (0.78 vs 0.69). Conclusion: This evaluation suggests that searching specific sections may improve precision under certain conditions and often with loss of recall.

[1]  Ingrid Zukerman,et al.  Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources , 2016, J. Biomed. Informatics.

[2]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[3]  Dina Demner-Fushman,et al.  Automatic segmentation of clinical texts , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[5]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[6]  Joe Kesterson,et al.  Comparing methods for identifying pancreatic cancer patients using electronic data sources. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[7]  Randolph A. Miller,et al.  Development and Evaluation of a Clinical Note Section Header Terminology , 2008, AMIA.

[8]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[9]  William R. Hersh,et al.  Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track , 2012, AMIA.

[10]  Joshua C. Denny,et al.  Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance , 2016, J. Am. Medical Informatics Assoc..

[11]  Dina Demner-Fushman,et al.  Application of Information Technology: Essie: A Concept-based Search Engine for Structured Biomedical Text , 2007, J. Am. Medical Informatics Assoc..

[12]  R G Mark,et al.  MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring , 2002, Computers in Cardiology.

[13]  Shyam Visweswaran,et al.  Building an automated SOAP classifier for emergency department reports , 2012, J. Biomed. Informatics.

[14]  Ricky K. Taira,et al.  Automatic Section Segmentation of Medical Reports , 2003, AMIA.