Data imputation and dimensionality reduction using deep learning in industrial data

Due to human errors, noise during transmission and other interference, some data collected from industrial process system might be lost in the collection process, which would affect the whole quality of data. In addition, the data collected by the industrial control system are generally composited of a large number of high-dimensional data. To facilitate the follow-up processing like anomaly detection, the “the curse of dimensionality” need to be solved, to obtain useful and meaningful content from massive high-dimensional data. The features obtained from DBNs (Deep Belief Networks) are usually not on the low-dimensional surface, and they can well express high-dimensional nonlinear function with a variety of variables. Therefore, in this paper, the DBNs are used to solve the data processing problem in industrial control system. Moreover, some experiments have been done to reveal that the DBNs algorithm can improve the filling accuracy, and the reduction of dimensions of data is good for effective information extraction.

[1]  Zhikui Chen,et al.  A Data Imputation Method Based on Deep Belief Network , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[2]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[3]  Abdallah Bashir Musa A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression , 2013, International Journal of Machine Learning and Cybernetics.

[4]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[5]  Alan F. Murray,et al.  Continuous restricted Boltzmann machine with an implementable training algorithm , 2003 .

[6]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[11]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[12]  Md Zahidul Islam,et al.  A Decision Tree-based Missing Value Imputation Technique for Data Pre-processing , 2011, AusDM.