Data-driven missing data imputation in cluster monitoring system based on deep neural network

Due to cluster instability, not in the cluster monitoring system. This paper focuses on the missing data imputation processing for the cluster monitoring application and proposes a new hybrid multiple imputation framework. This new imputation approach is different from the conventional multiple imputation technologies in the fact that it attempts to impute the missing data for an arbitrary missing pattern with a model-based and data-driven combination architecture. Essentially, the deep neural network, as the data model, extracts deep features from the data and deep features are further calculated then by a regression or data-driven strategies and used to create the estimation of missing data with the arbitrary missing pattern. This paper gives evidence that if we can train a deep neural network to construct the deep features of the data, imputation based on deep features is better than that directly on the original data. In the experiments, we compare the proposed method with other conventional multiple imputation approaches for varying missing data patterns, missing ratios, and different datasets including real cluster data. The result illustrates that when data encounters larger missing ratio and various missing patterns, the proposed algorithm has the ability to achieve more accurate and stable imputation performance.

[1]  Neil Salkind,et al.  Using SPSS for Windows: Analyzing and Understanding Data with Disk , 1997 .

[2]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[3]  Swati Aggarwal,et al.  Using fuzzy c means and multi layer perceptron for data imputation: Simple v/s complex dataset , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[4]  Shancang Li,et al.  MIAEC: Missing Data Imputation Based on the Evidence Chain , 2018, IEEE Access.

[5]  Alex de Sherbinin,et al.  A global Water Quality Index and hot-deck imputation of missing data , 2012 .

[6]  Majid Sarrafzadeh,et al.  Missing data imputation for remote CHF patient monitoring systems , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Roozbeh Razavi-Far,et al.  Imputation of missing data using fuzzy neighborhood density-based clustering , 2016, 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Zhikui Chen,et al.  A Data Imputation Method Based on Deep Belief Network , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[10]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[11]  Susan E. Bedingfield,et al.  A new iterative fuzzy clustering algorithm for multiple imputation of missing data , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[12]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[13]  Yang Zhang,et al.  Missing traffic flow data prediction using least squares support vector machines in urban arterial streets , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[14]  이상헌,et al.  Deep Belief Networks , 2010, Encyclopedia of Machine Learning.

[15]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[16]  Xiaobo Chen,et al.  An Improved Self-Representation Approach for Missing Value Imputation , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[17]  S. Thirukumaran,et al.  Improving accuracy rate of imputation of missing data using classifier methods , 2016, 2016 10th International Conference on Intelligent Systems and Control (ISCO).

[18]  Steffi Pauli Susanti,et al.  Imputation of missing value using dynamic Bayesian network for multivariate time series data , 2017, 2017 International Conference on Data and Software Engineering (ICoDSE).

[19]  Geoffrey E. Hinton,et al.  Using very deep autoencoders for content-based image retrieval , 2011, ESANN.

[20]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[21]  S. Soni,et al.  An imputation-based method for fuzzy clustering of incomplete data , 2017, 2017 International Conference on Communication and Signal Processing (ICCSP).

[22]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Purna Mukhopadhyay,et al.  Multiple Imputation of Missing Data Using SAS , 2015 .

[24]  Mark Huisman,et al.  Missing Network Data A Comparison of Different Imputation Methods , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[25]  Yang C. Yuan,et al.  Multiple Imputation for Missing Data: Concepts and New Development , 2000 .

[26]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[27]  José Luís Calvo-Rolle,et al.  A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers , 2015, Sensors.

[28]  Mauro Roisenberg,et al.  Adaptive Missing Data Imputation with Incremental Neuro-Fuzzy Gaussian Mixture Network (INFGMN) , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[29]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[30]  Kuen-Fang Jea,et al.  A Missing Data Imputation Method With Distance Function , 2018, 2018 International Conference on Machine Learning and Cybernetics (ICMLC).

[31]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[32]  Yisheng Lv,et al.  A deep learning based approach for traffic data imputation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[33]  Fei Tang,et al.  Random forest missing data algorithms , 2017, Stat. Anal. Data Min..

[34]  Tzyy-Ping Jung,et al.  Feature extraction with deep belief networks for driver's cognitive states prediction from EEG data , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  John Hansen Using SPSS for Windows and Macintosh: Analyzing and Understanding Data , 2005 .

[37]  Yueming Hu,et al.  Local Similarity Imputation Based on Fast Clustering for Incomplete Data in Cyber-Physical Systems , 2018, IEEE Systems Journal.

[38]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[39]  Dabeeruddin Syed,et al.  Techniques to deal with missing data , 2016, 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA).

[40]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[41]  Peter M. Bentler,et al.  Treatments of Missing Data: A Monte Carlo Comparison of RBHDI, Iterative Stochastic Regression Imputation, and Expectation-Maximization , 2000 .