Autoencoders for health improvement by compressing the set of patient features

A challenge to solve when analyzing multimorbidity patterns in elderly people is the management of a high number of characteristics associated with each patient. The main variables to study multimorbidity are diseases, however other variables should be considered to better classify the people included in each pattern. Age, sex, social class and medication are frequently used in the typing of each multimorbidity pattern. Subsequently the cardinality of the set of features that characterize a patient is very high and normally, the set is compressed to obtain a patient vector of new variables whose dimension is noticeably smaller than that of the initial set. To minimize the loss of information by compression, traditionally Principal Component Analysis (PCA) based projection techniques have been used, which although they are generally a good option, the projection is linear, which somehow reduces its flexibility and limits the performance. As an alternative to the PCA based techniques, in this paper, it is proposed to use autoencoders, and it is shown the improvement in the obtained multimorbidity patterns from the compressed database, when the registered data on about a million patients (5 years’ follow-up) are processed. This work demonstrates that autoencoders retain a larger amount of information in each pattern and results are more consistent with clinical experience than other approaches frequently found in the literature.Clinical relevance— From an epidemiological perspective, the contribution is relevant, since it allows for a more precise analysis of multimorbidity patterns, leading to better approaches to patient health strategies.

[1]  Paul J. Kennedy,et al.  Relational autoencoder for feature extraction , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[2]  Daniel Prieto-Alhambra,et al.  Construction and validation of a scoring system for the selection of high-quality data in a Spanish population primary care database (SIDIAP). , 2011, Informatics in primary care.

[3]  Gholamreza Salimi Khorshidi,et al.  Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation , 2019, J. Biomed. Informatics.

[4]  J. Valderas,et al.  Soft clustering using real-world data for the identification of multimorbidity patterns in an elderly population: cross-sectional study in a Mediterranean population , 2019, BMJ Open.

[5]  Marsela Polic,et al.  Convolutional Autoencoder for Feature Extraction in Tactile Sensing , 2019, IEEE Robotics and Automation Letters.

[6]  Shadnaz Asgari,et al.  A Simple Unsupervised, Real-time Clustering Method for Arterial Blood Pressure Signal Classification , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[7]  Laura Fratiglioni,et al.  Assessing and Measuring Chronic Multimorbidity in the Older Population: A Proposal for Its Operationalization , 2016, The journals of gerontology. Series A, Biological sciences and medical sciences.

[8]  Shatrunjai P. Singh,et al.  Unsupervised Machine Learning for Co/Multimorbidity Analysis , 2018, International Journal of Statistics and Probability.

[9]  Yike Guo,et al.  Feature extraction with stacked autoencoders for epileptic seizure detection , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[10]  Hiroshi Noguchi,et al.  Pattern detection from seating pressure distribution during wheelchair motion using deep embedded clustering , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[11]  Y. Anzai,et al.  Pattern Recognition & Machine Learning , 2016 .

[12]  Feng Liu,et al.  Deep Learning and Its Applications in Biomedicine , 2018, Genom. Proteom. Bioinform..