Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data

Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length.

[1]  Vic Hasselblad,et al.  Can one assess whether missing data are missing at random in medical studies? , 2006, Statistical methods in medical research.

[2]  Jian Pei,et al.  Cleaning disguised missing data: a heuristic approach , 2007, KDD '07.

[3]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[4]  George Hripcsak,et al.  Methodological Review: A review of causal inference for biomedical informatics , 2011 .

[5]  Vadlamani Ravi,et al.  Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts , 2012, Expert Syst. Appl..

[6]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[7]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[8]  George Hripcsak,et al.  Nonconvulsive seizures after subarachnoid hemorrhage: Multimodal detection and outcomes , 2013, Annals of neurology.

[9]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[10]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[11]  Geert Molenberghs,et al.  Missing Data in Clinical Studies , 2007 .

[12]  Damien Garcia,et al.  Robust smoothing of gridded data in one and higher dimensions with missing values , 2010, Comput. Stat. Data Anal..

[13]  Melanie L Bell,et al.  Handling missing data in RCTs; a review of the top medical journals , 2014, BMC Medical Research Methodology.

[14]  C. Cobelli,et al.  In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes , 2009, Journal of diabetes science and technology.

[15]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[16]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[17]  Emmanuel Müller,et al.  Flexible Fault Tolerant Subspace Clustering for Data with Missing Values , 2011, 2011 IEEE 11th International Conference on Data Mining.

[18]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[19]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[20]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[21]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Joseph W Hogan,et al.  Standards should be applied in the prevention and handling of missing data for patient-centered outcomes research: a systematic review and expert consensus. , 2014, Journal of clinical epidemiology.

[24]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[25]  Tshilidzi Marwala,et al.  A dynamic programming approach to missing data estimation using neural networks , 2013, Inf. Sci..

[26]  Yulei He,et al.  Missing data analysis using multiple imputation: getting to the heart of the matter. , 2010, Circulation. Cardiovascular quality and outcomes.

[27]  Robert J Glynn,et al.  Bias due to missing exposure data using complete‐case analysis in the proportional hazards regression model , 2003, Statistics in medicine.

[28]  Mats O. Karlsson,et al.  Comparison of Methods for Handling Missing Covariate Data , 2013, The AAPS Journal.

[29]  S. Crawford,et al.  A comparison of anlaytic methods for non-random missingness of outcome data. , 1995, Journal of clinical epidemiology.

[30]  Loris Nanni,et al.  A classifier ensemble approach for the missing feature problem , 2012, Artif. Intell. Medicine.

[31]  Michael Schomaker,et al.  Model selection and model averaging after multiple imputation , 2014, Comput. Stat. Data Anal..

[32]  Chris Chatfield,et al.  The Analysis of Time Series , 1990 .

[33]  Elizabeth A. McDevitt,et al.  Nocturnal Continuous Glucose and Sleep Stage Data in Adults with Type 1 Diabetes in Real-World Conditions , 2013, Journal of diabetes science and technology.

[34]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[35]  Tianwei Yu,et al.  Incorporating Nonlinear Relationships in Microarray Missing Value Imputation , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  C. Cobelli,et al.  Physical Activity into the Meal Glucose—Insulin Model of Type 1 Diabetes: In Silico Studies , 2009, Journal of diabetes science and technology.

[37]  Yan Lin,et al.  Missing value imputation in high-dimensional phenomic data: imputable or not, and how? , 2014, BMC Bioinformatics.

[38]  Witold Pedrycz,et al.  A Novel Framework for Imputation of Missing Values in Databases , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[39]  Richard S. Zemel,et al.  Recommender Systems, Missing Data and Statistical Model Estimation , 2011, IJCAI.