Leveraging Patient Similarity and Time Series Data in Healthcare Predictive Models

Patient time series classification faces challenges in high degrees of dimensionality and missingness. In light of patient similarity theory, this study explores effective temporal feature engineering and reduction, missing value imputation, and change point detection methods that can afford similarity-based classification models with desirable accuracy enhancement. We select a piecewise aggregation approximation method to extract fine-grain temporal features and propose a minimalist method to impute missing values in temporal features. For dimensionality reduction, we adopt a gradient descent search method for feature weight assignment. We propose new patient status and directional change definitions based on medical knowledge or clinical guidelines about the value ranges for different patient status levels, and develop a method to detect change points indicating positive or negative patient status changes. We evaluate the effectiveness of the proposed methods in the context of early Intensive Care Unit mortality prediction. The evaluation results show that the k-Nearest Neighbor algorithm that incorporates methods we select and propose significantly outperform the relevant benchmarks for early ICU mortality prediction. This study makes contributions to time series classification and early ICU mortality prediction via identifying and enhancing temporal feature engineering and reduction methods for similarity-based time series classification.

[1]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[2]  Nicholas S Ward,et al.  Using serial severity scores to predict death in ICU patients: a validation study and review of the literature , 2009, Current opinion in critical care.

[3]  Chih-Ping Wei,et al.  Nearest-neighbor-based approach to time-series classification , 2012, Decis. Support Syst..

[4]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[5]  Randolph P. Martin,et al.  Early detection and prediction of cardiotoxicity in chemotherapy-treated patients. , 2011, The American journal of cardiology.

[6]  Alistair E. W. Johnson,et al.  Patient specific predictions in the intensive care unit using a Bayesian ensemble , 2012, 2012 Computing in Cardiology.

[7]  Bo Tang,et al.  ENN: Extended Nearest Neighbor Method for Pattern Recognition [Research Frontier] , 2015, IEEE Computational Intelligence Magazine.

[8]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[9]  Hsinchun Chen,et al.  Time-to-Event Predictive Modeling for Chronic Conditions Using Electronic Health Records , 2014, IEEE Intelligent Systems.

[10]  Aziz Sheikh,et al.  Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore , 2009, BMJ : British Medical Journal.

[11]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[12]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[14]  T. Chesney,et al.  Imputation methods to deal with missing values when data mining trauma injury data , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[15]  Mark Chignell,et al.  Predicting ICU Death with Summarized Data: The Emerging Health Data Search Engine , 2014 .

[16]  Hubert Preissl,et al.  Detection of Uterine MMG Contractions Using a Multiple Change Point Estimator and the K-Means Cluster Algorithm , 2008, IEEE Transactions on Biomedical Engineering.

[17]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[18]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[19]  Kerrie Mengersen,et al.  Bayesian Change Point Detection in Monitoring Cardiac Surgery Outcomes , 2011, Quality management in health care.

[20]  YuHwanjo,et al.  Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods , 2008 .

[21]  Timothy A. Gonsalves,et al.  Feature Selection for Text Classification Based on Gini Coefficient of Inequality , 2010, FSDM.

[22]  Douglas M. Hawkins,et al.  Statistical Process Control for Shifts in Mean or Variance Using a Changepoint Formulation , 2005, Technometrics.

[23]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[24]  M. J. Rice,et al.  Blood Glucose Measurement in the Intensive Care Unit: What is the Best Method? , 2013, Journal of diabetes science and technology.

[25]  Sun I. Kim,et al.  Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods , 2008, Artif. Intell. Medicine.

[26]  Santosh S. Vempala,et al.  Algorithmic Prediction of Health-Care Costs , 2008, Oper. Res..

[27]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.