Using Anchors to Estimate Clinical State without Labeled Data

We present a novel framework for learning to estimate and predict clinical state variables without labeled data. The resulting models can used for electronic phenotyping, triggering clinical decision support, and cohort selection. The framework relies on key observations which we characterize and term "anchor variables". By specifying anchor variables, an expert encodes a certain amount of domain knowledge about the problem while the rest of learning proceeds in an unsupervised manner. The ability to build anchors upon standardized ontologies and the framework's ability to learn from unlabeled data promote generalizability across institutions. We additionally develop a user interface to enable experts to choose anchor variables in an informed manner. The framework is applied to electronic medical record-based phenotyping to enable real-time decision support in the emergency department. We validate the learned models using a prospectively gathered set of gold-standard responses from emergency physicians for nine clinically relevant variables.

[1]  R. Seshadri,et al.  Performance of a Rapid Antigen-Detection Test and Throat Culture in Community Pediatric Offices: Implications for Management of Pharyngitis , 2009, Pediatrics.

[2]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[3]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[4]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[5]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[6]  Yacine Jernite,et al.  Predicting Chief Complaints at Triage Time in the Emergency Department , 2013 .

[7]  Hongfang Liu,et al.  A Study of Transportability of an Existing Smoking Status Detection Module across Institutions , 2012, AMIA.

[8]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[9]  Peggy L. Peissig,et al.  Learning to Predict Post-Hospitalization VTE Risk from EHR Data , 2012, AMIA.

[10]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[11]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[12]  M. Samore,et al.  Combining Free Text and Structured Electronic Medical Record Entries to Detect Acute Respiratory Infections , 2010, PloS one.

[13]  Michael A. Gerber,et al.  Performance of a rapid antigen-detection test and throat culture in community pediatric offices: Implications for management of pharyngitis (Pediatrics (2009) 123, 2, (437-444) DOI: 10.1542/peds.2008-0488) , 2009 .

[14]  Sean W. Smith,et al.  Healthcare information technology's relativity problems: a typology of how patients' physical reality, clinicians' mental models, and healthcare information technology differ , 2014, J. Am. Medical Informatics Assoc..

[15]  Peter J. Haug,et al.  Early Detection of Sepsis in the Emergency Department using Dynamic Bayesian Networks , 2012, AMIA.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Noémie Elhadad,et al.  Natural Language Processing in Health Care and Biomedicine , 2014 .

[18]  Michael A. Gerber,et al.  Tanz RR, Gerber MA, Kabat W, Rippe J, Seshadri R, Shulman ST. Performance of a Rapid Antigen-Detection Test and Throat Culture in Community Pediatric Offices: Implications for Management of Pharyngitis. Pediatrics. 2009;123(2): 437-444 , 2009 .