Clinically Meaningful Comparisons Over Time: An Approach to Measuring Patient Similarity based on Subsequence Alignment

Longitudinal patient data has the potential to improve clinical risk stratification models for disease. However, chronic diseases that progress slowly over time are often heterogeneous in their clinical presentation. Patients may progress through disease stages at varying rates. This leads to pathophysiological misalignment over time, making it difficult to consistently compare patients in a clinically meaningful way. Furthermore, patients present clinically for the first time at different stages of disease. This eliminates the possibility of simply aligning patients based on their initial presentation. Finally, patient data may be sampled at different rates due to differences in schedules or missed visits. To address these challenges, we propose a robust measure of patient similarity based on subsequence alignment. Compared to global alignment techniques that do not account for pathophysiological misalignment, focusing on the most relevant subsequences allows for an accurate measure of similarity between patients. We demonstrate the utility of our approach in settings where longitudinal data, while useful, are limited and lack a clear temporal alignment for comparison. Applied to the task of stratifying patients for risk of progression to probable Alzheimer's Disease, our approach outperforms models that use only snapshot data (AUROC of 0.839 vs. 0.812) and models that use global alignment techniques (AUROC of 0.822). Our results support the hypothesis that patients' trajectories are useful for quantifying inter-patient similarities and that using subsequence matching and can help account for heterogeneity and misalignment in longitudinal data.

[1]  A. Gray,et al.  The effect of diabetes complications on health-related quality of life: the importance of longitudinal data to address patient heterogeneity. , 2014, Health economics.

[2]  Joelle Pineau,et al.  Learning Robust Features using Deep Learning for Automatic Seizure Detection , 2016, MLHC.

[3]  David C. Kale,et al.  Modeling Missing Data in Clinical Time Series with RNNs , 2016 .

[4]  Jun Wang,et al.  On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case , 2015, SDM.

[5]  G. Ginsburg,et al.  Personalized medicine: revolutionizing drug discovery and patient care. , 2001, Trends in biotechnology.

[6]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[7]  C. Jack,et al.  Hypothetical model of dynamic biomarkers of the Alzheimer's pathological cascade , 2010, The Lancet Neurology.

[8]  David Sontag,et al.  Multi-task Prediction of Disease Onsets from Longitudinal Laboratory Tests , 2016, MLHC.

[9]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.

[10]  Alan Bundy,et al.  Dynamic Time Warping , 1984 .

[11]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[12]  Mark E. Schmidt,et al.  The Alzheimer’s Disease Neuroimaging Initiative: A review of papers published since its inception , 2012, Alzheimer's & Dementia.

[13]  David C. Kale,et al.  Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series , 2016, MLHC.

[14]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[15]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[16]  Eamonn J. Keogh,et al.  Prefix and Suffix Invariant Dynamic Time Warping , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Paul M. Thompson,et al.  Multi-source learning with block-wise missing data for Alzheimer's disease prediction , 2013, KDD.

[18]  Nick C Fox,et al.  The clinical use of structural MRI in Alzheimer disease , 2010, Nature Reviews Neurology.

[19]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[20]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Zeeshan Syed,et al.  Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization , 2014, AAAI.

[22]  Jenna Wiens,et al.  Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task , 2012, NIPS.

[23]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[24]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[25]  Peter Szolovits,et al.  Predicting ICU Mortality Risk by Grouping Temporal Trends from a Multivariate Panel of Physiologic Measurements , 2016, AAAI.

[26]  G. Ginsburg,et al.  The path to personalized medicine. , 2002, Current opinion in chemical biology.

[27]  S. Levinson,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Collin M. Stultz,et al.  Relation of death within 90 days of non-ST-elevation acute coronary syndromes to variability in electrocardiographic morphology. , 2009, The American journal of cardiology.