Removal and interpolation of missing values using wavelet neural network for heterogeneous data sets

Missing data are common occurrences and can have a significant effect on the conclusions that can be drawn from the data. In statistics, missing data or missing values occur when no data value is stored for the variable in the current observation. Due to missing value we are facing several problems like information loss for computation and analysis of data. Missing values can also cause misleading results by introducing bias. Serious bias is a systematic difference between the observed and the unobserved data. This paper focuses on a methodological framework for the development of an automated data imputation model based on wavelet neural network (WNN). Here we use an adaptive higher order functions or different wavelet functions as the kernel of NN instead of each neuron activation function. A wavelet is a wavelike oscillation with a amplitude that starts out at zero, increases, and then decreases back to zero. Generally, wavelets are purposefully crafted to have specific properties that make them useful for signal processing. Six real, integer and simulated data sets are exposed to a perturbation experiment, based on the random generation of missing values. Here neural network (NN) and WNN is applied in glass identification, wine recognition, heart disease, leukemia, breast cancer and lung cancer data set to find the missing value and compared with different classic imputation procedures. The experiment conducted considering different performance measures using WNN, not only improves the quality of a database with missing value but also the best results are clearly obtained with different variables.

[1]  Ruisheng Zhang,et al.  Prediction of Programmed-temperature Retention Values of Naphthas by Wavelet Neural Networks , 2001, Comput. Chem..

[2]  Sung-Nien Yu,et al.  Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network , 2007, Pattern Recognit. Lett..

[3]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[4]  Sunita Sarawagi,et al.  Sequence Data Mining , 2005 .

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[6]  J. Armstrong,et al.  Illusions in Regression Analysis , 2011 .

[7]  Shung-Yung Lung Efficient text independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm , 2007, Pattern Recognit..

[8]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[9]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[10]  Yilu Liu,et al.  Rough set and fuzzy wavelet neural network integrated with least square weighted fusion algorithm based fault diagnosis research for power transformers , 2008 .

[11]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[12]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[15]  Ming Zhang,et al.  Application of Higher-Order Neural Networks to Financial Time-Series Prediction , 2006 .

[16]  Tony F. Chan,et al.  Image processing and analysis - variational, PDE, wavelet, and stochastic methods , 2005 .

[17]  M. Moraud Wavelet Networks , 2018, Foundations of Wavelet Networks and Applications.

[18]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .