Health Risk Prediction Using Big Medical Data - a Collaborative Filtering-Enhanced Deep Learning Approach

The massive amount of medical data accumulated from patients and healthcare providers has become a vast reservoir of knowledge source that may enable promising applications such as risk predictive modeling, clinical decision support, disease or safety surveillance. However, discovering knowledge from the big medical data can be very complex because of the nature of this type of data: they normally contain large amount of unstructured data; they may have lots of missing values; they can be highly complex and heterogeneous. To address these challenges, in this paper we propose a Collaborative Filtering-Enhanced Deep Learning approach. In particular, we estimate missing values based on patients’ similarity, i.e., we predict one patient’s missing features based on the values of similar patients. This is implemented with the Collaborative Topic Regression method, which tightly couples topic model and probability matrix factorization and is able to utilize the rich information hidden in the data. Then a deep neural network-based method is applied for the prediction of health risks. This method can help us handle complex and multi-modality data. Extensive experiments on a real-world dataset have been performed and the results show improvements of our proposed algorithm over the state-of-the-art methods.

[1]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[2]  David C. Kale,et al.  Modeling Missing Data in Clinical Time Series with RNNs , 2016 .

[3]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[4]  Xiangji Huang,et al.  Deep learning for healthcare decision making with EMRs , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  Adrian F Hernandez,et al.  Trends in 30-Day Readmission Rates for Patients Hospitalized With Heart Failure: Findings From the Get With The Guidelines-Heart Failure Registry. , 2016, Circulation. Heart failure.

[6]  Ping Zhang,et al.  Clinical risk prediction with multilinear sparse logistic regression , 2014, KDD.

[7]  Hongfang Liu,et al.  Temporal Pattern and Association Discovery of Diagnosis Codes Using Deep Learning , 2015, 2015 International Conference on Healthcare Informatics.

[8]  P. Hogan,et al.  Economic Costs of Diabetes in the U , 2013 .

[9]  Yan Liu,et al.  Distilling Knowledge from Deep Networks with Applications to Healthcare Domain , 2015, ArXiv.

[10]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[11]  Yan Liu,et al.  Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems , 2012, ICML.

[12]  Gwénolé Quellec,et al.  Deep image mining for diabetic retinopathy screening , 2016, Medical Image Anal..

[13]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[16]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[17]  Meng Wang,et al.  Disease Inference from Health-Related Questions via Sparse Deep Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Plamen Nikolov,et al.  Economic Costs of Diabetes in the U.S. in 2002 , 2003, Diabetes care.

[19]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[20]  Joel J. P. C. Rodrigues,et al.  Evolutionary radial basis function network for gestational diabetes data analytics , 2017, J. Comput. Sci..

[21]  Economic costs of diabetes in the U.S. in 2012. Diabetes Care 2013;36:1033–1046 , 2013, Diabetes Care.

[22]  Ronald M. Summers,et al.  Interleaved text/image Deep Mining on a large-scale radiology database , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).