Spatio-temporal autoencoder for feature learning in patient data with missing observations

Modern patient data tends to be large-scale and multi-dimensional, containing both spatial and temporal features. Learning good spatio-temporal features from large patient data is a challenging task, especially when there are missing observations. In this paper, we propose a spatio-temporal autoencoder (STAE), an unsupervised deep learning scheme, to learn features from large-scale and high-dimensional patient data with missing observations. Through both spatial and temporal encoding, STAE is able to automatically identify patterns and dependencies in the patient data, even with missing values, and learn a compact representation of each patient for better classification. Publicly available electroencephalogram (EEG) data are extracted from the UCI Machine Learning Repository to test and support our findings. Through simulations, we compare STAE with several baseline feature selection methods and demonstrate its effectiveness in the presence of missing data.

[1]  Pierre Vandergheynst,et al.  Compressed Sensing for Real-Time Energy-Efficient ECG Compression on Wireless Body Sensor Nodes , 2011, IEEE Transactions on Biomedical Engineering.

[2]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[3]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[4]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[5]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[6]  Mehul Motani,et al.  Learning Deep Representations from Heterogeneous Patient Data for Predictive Diagnosis , 2017, BCB.

[7]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[8]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[10]  Markus Svensén,et al.  Beyond atopy: multiple patterns of sensitization in relation to asthma in a birth cohort study. , 2010, American journal of respiratory and critical care medicine.

[11]  Yixin Chen,et al.  An integrated data mining approach to real-time clinical monitoring and deterioration warning , 2012, KDD.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[14]  Marcin Korytkowski,et al.  Convolutional Neural Networks for Time Series Classification , 2017, ICAISC.

[15]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[16]  May D. Wang,et al.  A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records , 2016, BCB.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Xiangji Huang,et al.  Deep learning for healthcare decision making with EMRs , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[19]  Bruce R. Schatz,et al.  Mining Discriminative Patterns to Predict Health Status for Cardiopulmonary Patients , 2016, BCB.

[20]  Fei Wang,et al.  A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[22]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[23]  Noémie Elhadad,et al.  Identifying and mitigating biases in EHR laboratory tests , 2014, J. Biomed. Informatics.

[24]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.