Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

Multimorbidity, or the presence of several medical conditions in the same individual, has been increasing in the population - both in absolute and relative terms. Nevertheless, multimorbidity remains poorly understood, and the evidence from existing research to describe its burden, determinants and consequences has been limited. Previous studies attempting to understand multimorbidity patterns are often cross-sectional and do not explicitly account for multimorbidity patterns' evolution over time; some of them are based on small datasets and/or use arbitrary and narrow age ranges; and those that employed advanced models, usually lack appropriate benchmarking and validations. In this study, we (1) introduce a novel approach for using Non-negative Matrix Factorisation (NMF) for temporal phenotyping (i.e., simultaneously mining disease clusters and their trajectories); (2) provide quantitative metrics for the evaluation of these clusters and trajectories; and (3) demonstrate how the temporal characteristics of the disease clusters that result from our model can help mine multimorbidity networks and generate new hypotheses for the emergence of various multimorbidity patterns over time. We trained and evaluated our models on one of the world's largest electronic health records (EHR) datasets, containing more than 7 million patients, from which over 2 million where relevant to, and hence included in this study.

[1]  K. Rahimi,et al.  Patterns and temporal trends of comorbidity among adult patients with incident cardiovascular disease in the UK between 2000 and 2014: A population-based cohort study , 2018, PLoS medicine.

[2]  V Seagroatt,et al.  Use of large medical databases to study associations between diseases. , 2000, QJM : monthly journal of the Association of Physicians.

[3]  Marinka Zitnik,et al.  NIMFA: A Python Library for Nonnegative Matrix Factorization , 2012, J. Mach. Learn. Res..

[4]  Kelvin P. Jordan,et al.  Distinct trajectories of multimorbidity in primary care were identified using latent class growth analysis , 2014, Journal of clinical epidemiology.

[5]  J. R. Berger,et al.  Risk of Myocardial Infarction and Stroke After Acute Infection or Vaccination , 2006 .

[6]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[7]  Svetha Venkatesh,et al.  Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM) , 2015, J. Biomed. Informatics.

[8]  Yun Zhang,et al.  Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study , 2019, J. Biomed. Informatics.

[9]  Richard W. Vuduc,et al.  Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization , 2019, J. Biomed. Informatics.

[10]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[11]  T. Walley,et al.  The UK General Practice Research Database , 1997, The Lancet.

[12]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[13]  Fei Wang,et al.  Readmission prediction via deep contextual embedding of clinical concepts , 2018, PloS one.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Chris H. Q. Ding,et al.  Binary Matrix Factorization with Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Jimeng Sun,et al.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review , 2018, J. Am. Medical Informatics Assoc..

[18]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[19]  B. Winblad,et al.  Patterns of Chronic Multimorbidity in the Elderly Population , 2009, Journal of the American Geriatrics Society.

[20]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[21]  Svetha Venkatesh,et al.  Dual Control Memory Augmented Neural Networks for Treatment Recommendations , 2018, PAKDD.

[22]  Mark D. Plumbley,et al.  Theorems on Positive Data: On the Uniqueness of NMF , 2008, Comput. Intell. Neurosci..

[23]  Mark Woodward,et al.  Usual blood pressure, atrial fibrillation and vascular risk: evidence from 4.3 million adults , 2016, International journal of epidemiology.

[24]  Lu Wang,et al.  Personalized Prescription for Comorbidity , 2018, DASFAA.

[25]  Mark Woodward,et al.  Usual blood pressure, peripheral arterial disease, and vascular risk: cohort study of 4.2 million adults , 2015, BMJ : British Medical Journal.

[26]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[27]  J. Valderas,et al.  Comparative analysis of methods for identifying multimorbidity patterns: a study of ‘real-world’ data , 2018, BMJ Open.

[28]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[29]  Fei Wang,et al.  Towards heterogeneous temporal clinical event pattern discovery: a convolutional approach , 2012, KDD.

[30]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  J A Knottnerus,et al.  Multimorbidity in general practice: prevalence, incidence, and determinants of co-occurring chronic and recurrent diseases. , 1998, Journal of clinical epidemiology.

[32]  K. Bhaskaran,et al.  Data Resource Profile: Clinical Practice Research Datalink (CPRD) , 2015, International journal of epidemiology.

[33]  Mark W. Woolrich,et al.  Network modelling methods for FMRI , 2011, NeuroImage.

[34]  Svetha Venkatesh,et al.  $\mathtt {Deepr}$: A Convolutional Net for Medical Records , 2016, IEEE Journal of Biomedical and Health Informatics.

[35]  Christian Guttmann,et al.  Deep learning architectures for vector representations of patients and exploring predictors of 30-day hospital readmissions in patients with multiple chronic conditions , 2018, AIH@IJCAI.

[36]  M. Hilton,et al.  Patterns of multimorbidity in working Australians , 2011, Population health metrics.

[37]  Hendrik van den Bussche,et al.  Multimorbidity Patterns in the Elderly: A New Approach of Disease Clustering Identifies Complex Interrelations between Chronic Conditions , 2010, PloS one.

[38]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[39]  Barbara Thorand,et al.  Patterns of Multimorbidity in the Aged Population. Results from the KORA-Age Study , 2012, PloS one.

[40]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[41]  Tudor I. Oprea,et al.  Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients , 2014, Nature Communications.

[42]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[43]  M Emberton,et al.  The ‘top 10’ urological procedures: a study of hospital episodes statistics 1998–99 , 2002, BJU international.

[44]  Guillermo Sapiro,et al.  Compressed Nonnegative Matrix Factorization Is Fast and Accurate , 2015, IEEE Transactions on Signal Processing.

[45]  David P. Woodruff,et al.  How to Fake Multiply by a Gaussian Matrix , 2016, ICML.

[46]  Yunde Jia,et al.  FISHER NON-NEGATIVE MATRIX FACTORIZATION FOR LEARNING LOCAL FEATURES , 2004 .

[47]  Xiang Wang,et al.  Unsupervised learning of disease progression models , 2014, KDD.

[48]  Nilmini Wickramasinghe,et al.  Deepr: A Convolutional Net for Medical Records , 2016, ArXiv.

[49]  Inderjit S. Dhillon,et al.  Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction , 2016, NIPS.

[50]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[52]  Rajan S. Patel,et al.  A Bayesian approach to determining connectivity of the human brain , 2006, Human brain mapping.

[53]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[54]  Tianxi Cai,et al.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data , 2018, PSB.

[55]  Jan Larsen,et al.  Bayesian nonnegative Matrix Factorization with volume prior for unmixing of hyperspectral images , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[56]  Amit Kiran,et al.  Influenza vaccination and risk of hospitalization in patients with heart failure: a self-controlled case series study , 2016, European heart journal.

[57]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.