Electronic medical record phenotyping using the anchor and learn framework

BACKGROUND Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient's electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention. MATERIALS AND METHODS We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels. RESULTS We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immunosuppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97. DISCUSSION The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients. CONCLUSION Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.

[1]  Hua Xu,et al.  Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[2]  Christopher G. Chute,et al.  A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record , 2010, PloS one.

[3]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[4]  C. Carlson,et al.  Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network , 2011, Human Genetics.

[5]  Hongfang Liu,et al.  A Study of Transportability of an Existing Smoking Status Detection Module across Institutions , 2012, AMIA.

[6]  Christopher G Chute,et al.  Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. , 2011, American journal of human genetics.

[7]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[8]  Melissa A. Basford,et al.  Identification of Genomic Predictors of Atrioventricular Conduction: Using Electronic Medical Records as a Tool for Genome Science , 2010, Circulation.

[9]  Jonathan M. Teich,et al.  Grand challenges in clinical decision support , 2008, J. Biomed. Informatics.

[10]  Lucila Ohno-Machado Clinical machine learning , 2005, J. Biomed. Informatics.

[11]  George Hripcsak,et al.  Using narratives as a source to automatically learn phenotype models , 2014 .

[12]  David Sontag,et al.  Using Anchors to Estimate Clinical State without Labeled Data , 2014, AMIA.

[13]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[14]  David W. Bates,et al.  Improving completeness of electronic problem lists through clinical decision support: a randomized, controlled trial , 2012, J. Am. Medical Informatics Assoc..

[15]  Christopher G Chute,et al.  Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Shelley A. Rusincovitch,et al.  Clinical Research Informatics and Electronic Health Record Data , 2014, Yearbook of Medical Informatics.

[17]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[18]  Nigam H. Shah,et al.  Learning statistical models of phenotypes using noisy labeled training data , 2016, J. Am. Medical Informatics Assoc..

[19]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[20]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[21]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  I. Kohane,et al.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing , 2015, BMJ : British Medical Journal.

[24]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[25]  D. Roden,et al.  The Emerging Role of Electronic Medical Records in Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[26]  Peter D. Stetson,et al.  Use of Semantic Features to Classify Patient Smoking Status , 2008, AMIA.

[27]  Hua Xu,et al.  Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records , 2013, J. Am. Medical Informatics Assoc..

[28]  S. Mani,et al.  Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[29]  Tejal K. Gandhi,et al.  Incomplete care--on the trail of flaws in the system. , 2011, The New England journal of medicine.

[30]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[31]  J. Denny,et al.  Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.