Analyzing Patient Records to Establish If and When a Patient Suffered from a Medical Condition

The growth of digital clinical data has raised questions as to how best to leverage this data to aid the world of healthcare. Promising application areas include Information Retrieval and Question-Answering systems. Such systems require an in-depth understanding of the texts that are processed. One aspect of this understanding is knowing if a medical condition outlined in a patient record is recent, or if it occurred in the past. As well as this, patient records often discuss other individuals such as family members. This presents a second problem - determining if a medical condition is experienced by the patient described in the report or some other individual. In this paper, we investigate the suitability of a machine learning (ML) based system for resolving these tasks on a previously unexplored collection of Patient History and Physical Examination reports. Our results show that our novel Score-based feature approach outperforms the standard Linguistic and Contextual features described in the related literature. Specifically, near-perfect performance is achieved in resolving if a patient experienced a condition. While for the task of establishing when a patient experienced a condition, our ML system significantly outperforms the ConText system (87% versus 69% f-score, respectively).

[1]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[2]  Min Li,et al.  An ontology for clinical questions about the contents of patient notes , 2012, J. Biomed. Informatics.

[3]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[4]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[7]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[8]  Joel D. Martin,et al.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 , 2011, J. Am. Medical Informatics Assoc..

[9]  Adam Wright,et al.  Summarization of clinical information: A conceptual model , 2011, J. Biomed. Informatics.

[10]  Peter J. Haug,et al.  MPLUS: a probabilistic medical language understanding system , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[11]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Özlem Uzuner,et al.  Machine learning and rule-based approaches to assertion classification. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[14]  David Tresner-Kirsch,et al.  MITRE system for clinical assertion status classification , 2011, J. Am. Medical Informatics Assoc..

[15]  Antonio Jimeno-Yepes,et al.  A Knowledge-Based Approach to Medical Records Retrieval , 2011, TREC.

[16]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[17]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Jefferson Provost,et al.  Na ive-Bayes vs. Rule-Learning in Classification of Email , 1999 .

[20]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[21]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[22]  Danielle L. Mowery,et al.  Distinguishing Historical from Current Problems in Clinical Reports – Which Textual Features Help? , 2009, BioNLP@HLT-NAACL.

[23]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.