Temporal Phenotyping using Deep Predictive Clustering of Disease Progression

Due to the wider availability of modern electronic health records, patient care data is often being stored in the form of time-series. Clustering such time-series data is crucial for patient phenotyping, anticipating patients' prognoses by identifying "similar" patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups. In this paper, we develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest (e.g., adverse events, the onset of comorbidities). To encourage each cluster to have homogeneous future outcomes, the clustering is carried out by learning discrete representations that best describe the future outcome distribution based on novel loss functions. Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks and identifies meaningful clusters that can be translated into actionable information for clinical decision-making.

[1]  Homa Karimabadi,et al.  Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features , 2018, ArXiv.

[2]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[3]  Jinsung Yoon,et al.  Discovery and Clinical Decision Support for Personalized Healthcare , 2017, IEEE Journal of Biomedical and Health Informatics.

[4]  Alessandro Rinaldo,et al.  Predictive clustering. , 2019 .

[5]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[6]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[7]  Duc Thanh Anh Luong,et al.  A K-Means Approach to Clustering Disease Progressions , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[8]  Gunnar Rätsch,et al.  SOM-VAE: Interpretable Discrete Representation Learning on Time Series , 2018, ICLR 2018.

[9]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[12]  Chunhua Weng,et al.  Unsupervised Time-Series Clustering Over Lab Data for Automatic Identification of Uncontrolled Diabetes , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[13]  Deborah Jarvis,et al.  Data‐driven adult asthma phenotypes based on clinical characteristics are associated with asthma outcomes twenty years later , 2019, Allergy.

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  M. Cugmas,et al.  On comparing partitions , 2015 .

[16]  J. Elborn,et al.  Current and emerging comorbidities in cystic fibrosis. , 2017, Presse medicale.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Erika D. Lease,et al.  Heterogeneity in Survival in Adult Patients With Cystic Fibrosis With FEV1 < 30% of Predicted in the United States , 2017, Chest.

[19]  Marc Aerts,et al.  Influence of chronic comorbidity and medication on the efficacy of treatment in patients with diabetes in general practice. , 2013, The British journal of general practice : the journal of the Royal College of General Practitioners.

[20]  Ferran Sanz,et al.  Identifying temporal patterns in patient disease trajectories using dynamic time warping: A population-based study , 2018, Scientific Reports.

[21]  Changhee Lee,et al.  Dynamic-DeepHit: A Deep Learning Approach for Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data , 2020, IEEE Transactions on Biomedical Engineering.

[22]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23]  Adam Wright,et al.  Leveraging electronic health records to support chronic disease management : the need for temporal data views , 2012 .

[24]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[25]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[26]  Fei Wang,et al.  Data-Driven Subtyping of Parkinson’s Disease Using Longitudinal Clinical Records: A Cohort Study , 2019, Scientific Reports.

[27]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[28]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.