Learning Deep Representations from Heterogeneous Patient Data for Predictive Diagnosis

Predictive diagnosis benefits both patients and hospitals. Major challenges limiting the effectiveness of machine learning based predictive diagnosis include the lack of efficient feature selection methods and the heterogeneity of measured patient data (e.g., vital signs). In this paper, we propose DLFS, an efficient feature selection scheme based on deep learning that is applicable for heterogeneous data. DLFS is unsupervised in nature and can learn compact representations from patient data automatically for efficient prediction. In this paper, the specific problem of predicting the patients' length of stay in the hospital is investigated in a predictive diagnosis framework which uses DLFS for feature selection. Real patient data from the pneumonia database of the National University Health System (NUHS) in Singapore are collected to verify the effectiveness of DLFS. By running experiments on real-world patient data and comparing with several other commonly used feature selection methods, we demonstrate the advantage of the proposed DLFS scheme.

[1]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[2]  Kannan Arputharaj,et al.  A Discrete Wavelet Based Feature Extraction and Hybrid Classification Technique for Microarray Data Analysis , 2014, TheScientificWorldJournal.

[3]  Ralph Snyderman,et al.  Personalized health care: From theory to practice , 2012, Biotechnology journal.

[4]  Jason Roy,et al.  Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches , 2010, Medical care.

[5]  Donald B. Chalfin,et al.  Impact of delayed transfer of critically ill patients from the emergency department to the intensive care unit* , 2007, Critical care medicine.

[6]  Mitchell S. V. Elkind,et al.  Impact of Delayed Transfer of Critically Ill Stroke Patients from the Emergency Department to the Neuro-ICU , 2010, Neurocritical care.

[7]  Daniel B. Neill,et al.  Using Artificial Intelligence to Improve Hospital Inpatient Care , 2013, IEEE Intelligent Systems.

[8]  Catherine Klersy,et al.  Nutritional parameters associated with prolonged hospital stay among ambulatory adult patients , 2010, Canadian Medical Association Journal.

[9]  Noémie Elhadad,et al.  Identifying and mitigating biases in EHR laboratory tests , 2014, J. Biomed. Informatics.

[10]  P. M. Prenter Splines and variational methods , 1975 .

[11]  O. C. Zienkiewicz Splines and variational methods, M. Prenter, Wiley, New York, 1975. £10.75 , 1976 .

[12]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[13]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[14]  Pierre Vandergheynst,et al.  Compressed Sensing for Real-Time Energy-Efficient ECG Compression on Wireless Body Sensor Nodes , 2011, IEEE Transactions on Biomedical Engineering.

[15]  Markus Svensén,et al.  Beyond atopy: multiple patterns of sensitization in relation to asthma in a birth cohort study. , 2010, American journal of respiratory and critical care medicine.

[16]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Rengang Yang,et al.  Power-Quality Disturbance Recognition Using S-Transform , 2007, IEEE Transactions on Power Delivery.

[20]  Pamela J Schoettker,et al.  Redesigning intensive care unit flow using variability management to improve access and safety. , 2009, Joint Commission journal on quality and patient safety.

[21]  Fei Wang,et al.  A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[23]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.