Personalized disease prediction using a CNN-based similarity learning method

Predicting patients' risk of developing certain diseases is an important research topic in healthcare. Personalized predictive modeling, which focuses on building specific models for individual patients, has shown its advantages on utilizing heterogeneous health data compared to global models trained on the entire population. Personalized predictive models use information from similar patient cohorts, in order to capture the specific characteristics. Accurately identifying and ranking the similarity among patients based on their historical records is a key step in personalized modeling. The electric health records (EHRs), which are irregular sampled and have varied patient visit lengths, cannot be directly used to measure patient similarity due to lack of an appropriate vector representation. In this paper, we build a novel time fusion CNN framework to simultaneously learn patient representations and measure pairwise similarity. Compared to a traditional CNN, our time fusion CNN can learn not only the local temporal relationships but also the contributions from each time interval. Along with the similarity learning process, the output information which is the probability distribution is used to rank similar patients. Utilizing the similarity scores, we perform personalized disease predictions, and compare the effect of different vector representations and similarity learning metrics.

[1]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[2]  R. Sharan,et al.  A method for inferring medical diagnoses from patient similarities , 2013, BMC Medicine.

[3]  Kebin Jia,et al.  A novel wavelet-based model for EEG epileptic seizure detection using multi-context learning , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[5]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Joon Lee,et al.  Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric , 2015, PloS one.

[7]  Fenglong Ma,et al.  Unsupervised Discovery of Drug Side-Effects from Heterogeneous Data Sources , 2017, KDD.

[8]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[9]  Jing Gao,et al.  A MultiTask Framework for Monitoring Health Conditions via Attention-based Recurrent Neural Networks , 2017 .

[10]  Kenney Ng,et al.  Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[11]  May D. Wang,et al.  A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records , 2016, BCB.

[12]  Nikola K. Kasabov,et al.  Global, local and personalised modeling and pattern discovery in bioinformatics: An integrated approach , 2007, Pattern Recognit. Lett..

[13]  Fei Wang,et al.  An RNN Architecture with Dynamic Temporal Matching for Personalized Predictions of Parkinson's Disease , 2017, SDM.

[14]  Jiayu Zhou,et al.  FORMULA: FactORized MUlti-task LeArning for task discovery in personalized medical models , 2015, SDM.

[15]  Anis Sharafoddini,et al.  Patient Similarity in Prediction Models Based on Health Data: A Scoping Review , 2017, JMIR medical informatics.

[16]  Shiyu Chang,et al.  Low-Rank Sparse Feature Selection for Patient Similarity Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Jimeng Sun,et al.  Multi-layer Representation Learning for Medical Concepts , 2016, KDD.

[19]  Jing Gao,et al.  Risk Factor Analysis Based on Deep Learning Models , 2016, BCB.

[20]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[21]  Kebin Jia,et al.  A Multi-view Deep Learning Method for Epileptic Seizure Detection using Short-time Fourier Transform , 2017, BCB.

[22]  Kebin Jia,et al.  Wave2Vec: Learning Deep Representations for Biosignals , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[23]  Fenglong Ma,et al.  Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks , 2017, KDD.

[24]  Fei Wang,et al.  Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).