Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data

One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.

[1]  Lukasz Kurgan,et al.  Trends in Data Mining and Knowledge Discovery , 2005 .

[2]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[3]  N. I. Kalyadin,et al.  Problems of medical monitoring of patients and the requirements for development of computer monitoring systems , 1996 .

[4]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[5]  Haifeng Xia Bayesian Hierarchical Model for Combining Two-resolution Metrology Data , 2010 .

[6]  J. Copas,et al.  Missing at random, likelihood ignorability and model completeness , 2004, math/0406451.

[7]  Siddhartha Chatterjee,et al.  Programming Models, Compilers, and Algorithms for Irregular Data-Parallel Computations , 1994, Int. J. High Speed Comput..

[8]  James D. Hamilton Time Series Analysis , 1994 .

[9]  G. Clifford,et al.  User guide and documentation for the MIMIC II database (version 2, release 1) , 2010 .

[10]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[11]  João Miguel da Costa Sousa,et al.  Predicting Outcomes of Septic Shock Patients Using Feature Selection Based on Soft Computing Techniques , 2010, IPMU.

[12]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[13]  Mark Huisman,et al.  Missing data in behavioral science research: Investigation of a collection of data sets , 1998 .

[14]  M. Gorelick,et al.  Bias arising from missing data in predictive models. , 2006, Journal of clinical epidemiology.

[15]  W. Muller,et al.  Visualization methods for time-dependent data - an overview , 2003, Proceedings of the 2003 Winter Simulation Conference, 2003..

[16]  Uzay Kaymak,et al.  Fuzzy Decision Making in Modeling and Control , 2002, World Scientific Series in Robotics and Intelligent Systems.

[17]  Peter Bajcsy,et al.  An Overview of DNA Microarray Grid Alignment and Foreground Separation Approaches , 2006, EURASIP J. Adv. Signal Process..

[18]  Sergios Theodoridis,et al.  A Novel Efficient Cluster-Based MLSE Equalizer for Satellite Communication Channels with-QAM Signaling , 2006, EURASIP J. Adv. Signal Process..

[19]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[20]  K. H. Pollock,et al.  Biostatistics: A Foundation for Analysis in the Health Sciences. , 1976 .

[21]  D. Heitjan,et al.  Annotation: what can be done about missing data? Approaches to imputation. , 1997, American journal of public health.

[22]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[23]  W. Shih,et al.  Current Controlled Trials in Cardiovascular Medicine , 2002 .

[24]  Wayne W. Daniel,et al.  Biostatistics: A Foundation for Analysis in the Health Sciences , 1974 .

[25]  John G. Orme,et al.  Multiple Regression with Missing Data , 1991 .

[26]  D. D. Meisel Fourier transforms of data sampled in unequally spaced segments. , 1979 .

[27]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Uzay Kaymak,et al.  Fuzzy criteria for feature selection , 2012, Fuzzy Sets Syst..

[29]  W. Press,et al.  Fast algorithm for spectral analysis of unevenly sampled data , 1989 .