A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences

Background Medical concepts are inherently ambiguous and error-prone due to human fallibility, which makes it hard for them to be fully used by classical machine learning methods (eg, for tasks like early stage disease prediction). Objective Our work was to create a new machine-friendly representation that resembles the semantics of medical concepts. We then developed a sequential predictive model for medical events based on this new representation. Methods We developed novel contextual embedding techniques to combine different medical events (eg, diagnoses, prescriptions, and labs tests). Each medical event is converted into a numerical vector that resembles its “semantics,” via which the similarity between medical events can be easily measured. We developed simple and effective predictive models based on these vectors to predict novel diagnoses. Results We evaluated our sequential prediction model (and standard learning methods) in estimating the risk of potential diseases based on our contextual embedding representation. Our model achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.79 on chronic systolic heart failure and an average AUC of 0.67 (over the 80 most common diagnoses) using the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Conclusions We propose a general early prognosis predictor for 80 different diagnoses. Our method computes numeric representation for each medical event to uncover the potential meaning of those events. Our results demonstrate the efficiency of the proposed method, which will benefit patients and physicians by offering more accurate diagnosis.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Mohamed F. Ghalwash,et al.  Extraction of Interpretable Multivariate Patterns for Early Diagnostics , 2013, 2013 IEEE 13th International Conference on Data Mining.

[4]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[5]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[6]  Svetha Venkatesh,et al.  Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM) , 2015, J. Biomed. Informatics.

[7]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[8]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[9]  N. Franklin,et al.  Diagnostic error in internal medicine. , 2005, Archives of internal medicine.

[10]  Ping Zhang,et al.  Clinical risk prediction with multilinear sparse logistic regression , 2014, KDD.

[11]  Ping Zhang,et al.  Clinical Risk Prediction by Exploring High-Order Feature Correlations , 2014, AMIA.

[12]  R. Sharan,et al.  A method for inferring medical diagnoses from patient similarities , 2013, BMC Medicine.

[13]  Henry C. Chueh,et al.  Presence of key findings in the medical record prior to a documented high-risk diagnosis , 2012, J. Am. Medical Informatics Assoc..

[14]  Yan Liu,et al.  Deep Computational Phenotyping , 2015, KDD.

[15]  Vandana Pursnani Janeja,et al.  Similarity in Patient Support Forums Using TF-IDF and Cosine Similarity Metrics , 2015, 2015 International Conference on Healthcare Informatics.

[16]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[17]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[18]  Jihoon Kim,et al.  A patient-driven adaptive prediction technique to improve personalized risk estimation for clinical decision support , 2012, J. Am. Medical Informatics Assoc..

[19]  Jyotishman Pathak,et al.  Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function , 2016, J. Biomed. Informatics.

[20]  Pasi Luukka,et al.  Similarity classifier in diagnosis of bladder cancer , 2008, Comput. Methods Programs Biomed..

[21]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[22]  S. Mamede,et al.  Cognitive diagnostic error in internal medicine. , 2013, European journal of internal medicine.

[23]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[24]  B. Reiser,et al.  Estimation of the Youden Index and its Associated Cutoff Point , 2005, Biometrical journal. Biometrische Zeitschrift.

[25]  James Pustejovsky,et al.  Lexical Knowledge Representation and Natural Language Processing , 1993, Artif. Intell..

[26]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .