论文信息 - Supervised multi-specialist topic model with applications on large-scale electronic health record data - 字舞流文

Supervised multi-specialist topic model with applications on large-scale electronic health record data

Motivation: Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. Materials and Methods: We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. Results: We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications: (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services. Availability and implementation: MixEHR-S source code and scripts of the experiments are freely available at https://github.com/li-lab-mcgill/mixehrS

Aman Verma | David Buckeridge | Guido Powell | Ziyang Song | Aihua Liu | Liming Guo | Ariane Marelli | Yue Li | Xavier Sumba Toral | Yixin Xu | D. Buckeridge | A. Marelli | Aman Verma | Liming Guo | G. Powell | Aihua Liu | Yue Li | Ziyang Song | Yixin Xu

[1] Linda R. Petzold,et al. Survival Topic Models for Predicting Outcomes for Trauma Patients , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[2] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[3] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.

[4] Yee Whye Teh,et al. On Smoothing and Inference for Topic Models , 2009, UAI.

[5] Le Song,et al. GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[6] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[7] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8] Li Li,et al. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[9] Adler J. Perotte,et al. Learning probabilistic phenotypes from heterogeneous EHR data , 2015, J. Biomed. Informatics.

[10] Joydeep Ghosh,et al. Identifiable Phenotyping using Constrained Non-Negative Matrix Factorization , 2016, MLHC.

[11] Ping Zhang,et al. Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[12] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[13] John D. Lafferty,et al. Dynamic topic models , 2006, ICML.

[14] Jimeng Sun,et al. Building bridges across electronic health record systems through inferred phenotypic topics , 2015, J. Biomed. Informatics.

[15] David Sontag,et al. Temporal Convolutional Neural Networks for Diagnosis from Lab Tests , 2015, ArXiv.

[16] Walter F. Stewart,et al. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[17] Jimeng Sun,et al. Phenotyping using Structured Collective Matrix Factorization of Multi--source EHR Data , 2016, 1609.04466.

[18] Yee Whye Teh,et al. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[19] David L Buckeridge,et al. Multivariate and Longitudinal Health System Indicators. , 2017, Studies in health technology and informatics.

[20] Charles Elkan,et al. Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[21] David A. McAllester,et al. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence , 2009, UAI 2009.

[22] Peter Szolovits,et al. The Use of Autoencoders for Discovering Patient Phenotypes , 2017, ArXiv.

[23] Jose Davila-Velderrain,et al. Inferring multimodal latent topics from electronic health records , 2020, Nature Communications.

[24] David L. Buckeridge,et al. Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models , 2018, ArXiv.

[25] T. Minka. Estimating a Dirichlet distribution , 2012 .

[26] Nilmini Wickramasinghe,et al. Deepr: A Convolutional Net for Medical Records , 2016, ArXiv.

[27] Yi Yang,et al. Recurrent disease progression networks for modelling risk trajectory of heart failure , 2021, PloS one.

[28] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[29] Jeffrey Dean,et al. Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[30] Rafael E. Riveros,et al. Studies in Health Technology and Informatics , 2005 .