Counter propagation auto-associative neural network based data imputation

In this paper, we propose two novel methods viz., counterpropagation auto-associative neural network (CPAANN) and grey system theory (GST) hybridised with CPAANN for data imputation. The effectiveness of these methods is demonstrated on 12 datasets and the results are compared with that of various extant methods. Wilcoxon signed rank test conducted at 1% level of significance, indicated that the proposed methods are statistically significant against all methods. The spectacular success of CPAANN can be attributed to the local learning, global approximation and auto-association that take place in tandem in a single architecture. Furthermore, significantly CPAANN turned out to be the best in the class of AANN architectures used for imputation. The reason could be the competitive learning that is intrinsic to the CPAANN architecture, but conspicuously absent in other auto-associative neural network architectures.

[1]  M. Beynon,et al.  Variable precision rough set theory and data discretisation: an application to corporate failure prediction , 2001 .

[2]  Taghi M. Khoshgoftaar,et al.  Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[3]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[4]  Aníbal R. Figueiras-Vidal,et al.  Classifying patterns with missing values using Multi-Task Learning perceptrons , 2013, Expert Syst. Appl..

[5]  Fengzhan Tian,et al.  A selective Bayes Classifier for classifying incomplete data based on gain ratio , 2008, Knowl. Based Syst..

[6]  Juan Carlos Figueroa García,et al.  Missing data imputation in multivariate data by evolutionary algorithms , 2011, Comput. Hum. Behav..

[7]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .

[8]  Sadiq M. Sait,et al.  Cell assignment in hybrid CMOS/nanodevices architecture using Tabu Search , 2013, Applied Intelligence.

[9]  L. L. Doove,et al.  Recursive partitioning for missing data imputation in the presence of interaction effects , 2014, Comput. Stat. Data Anal..

[10]  M. Ferro Missing data in longitudinal studies: cross-sectional multiple imputation provides similar estimates to full-information maximum likelihood. , 2014, Annals of epidemiology.

[11]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[12]  Washington Leite Junger,et al.  Imputation of missing data in time series for air pollutants , 2015 .

[13]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[14]  J. Preisser,et al.  Missing Data: Weighting and Imputation , 2014 .

[15]  Pilsung Kang,et al.  Locally linear reconstruction based missing value imputation for supervised learning , 2013, Neurocomputing.

[16]  Amit Gupta,et al.  Estimating Missing Values Using Neural Networks , 1996 .

[17]  Ignacio Olmeda,et al.  Hybrid Classifiers for Financial Multicriteria Decision Making: The Case of Bankruptcy Prediction , 1997 .

[18]  Chuan-Yu Chang,et al.  Copyright authentication for images with a full counter-propagation neural network , 2010, Expert Syst. Appl..

[19]  Serpil Canbas,et al.  Prediction of commercial bank failure via multivariate statistical analysis of financial structures: The Turkish case , 2005, Eur. J. Oper. Res..

[20]  Tshilidzi Marwala,et al.  Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm , 2013, Appl. Soft Comput..

[21]  Marjana Novič,et al.  Counter-propagation neural networks in Matlab , 2008 .

[22]  Piet M. T. Broersen,et al.  Autoregressive spectral analysis when observations are missing , 2004, Autom..

[23]  T. Marwala,et al.  Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm , 2006 .

[24]  Bing Yu,et al.  Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering , 2013, Applied Intelligence.

[25]  Soo-Young Lee,et al.  Training Algorithm with Incomplete Data for Feed-Forward Neural Networks , 1999, Neural Processing Letters.

[26]  Gustavo E. A. P. A. Batista,et al.  Experimental comparison pf K-NEAREST NEIGHBOUR and MEAN OR MODE imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data , 2003 .

[27]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[28]  Lynne E. Parker,et al.  Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks , 2014, Inf. Fusion.

[29]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[30]  Bogdan Gabrys,et al.  Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems , 2002, Int. J. Approx. Reason..

[31]  Johann Gasteiger,et al.  Neural networks with counter-propagation learning strategy used for modelling , 1995 .

[32]  Vadlamani Ravi,et al.  Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts , 2012, Expert Syst. Appl..

[33]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[34]  Vadlamani Ravi,et al.  A Computational Intelligence Based Online Data Imputation Method: An Application For Banking , 2013, J. Inf. Process. Syst..

[35]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[36]  Tshilidzi Marwala,et al.  A dynamic programming approach to missing data estimation using neural networks , 2013, Inf. Sci..

[37]  L.E. Parker,et al.  Classification with missing data in a wireless sensor network , 2008, IEEE SoutheastCon 2008.

[38]  Vadlamani Ravi,et al.  Particle swarm optimization and covariance matrix based data imputation , 2013, 2013 IEEE International Conference on Computational Intelligence and Computing Research.

[39]  Davide Ballabio,et al.  Characterization of the traditional Cypriot spirit Zivania by means of Counterpropagation Artificial Neural Networks , 2007 .

[40]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[41]  Darinka Brodnjak-Vončina,et al.  Multivariate data analysis in classification of vegetable oils characterized by the content of fatty acids , 2005 .

[42]  Amaury Lendasse,et al.  X-SOM and L-SOM: A double classification approach for missing value imputation , 2010, Neurocomputing.

[43]  M. Marseguerra,et al.  The AutoAssociative Neural Network in signal analysis: II. Application to on-line monitoring of a simulated BWR component , 2005 .

[44]  Wayne S. DeSarbo,et al.  A Constrained Unfolding Methodology for Product Positioning , 1986 .

[45]  Mahdi Vasighi,et al.  Genetic Algorithms for architecture optimisation of Counter-Propagation Artificial Neural Networks , 2011 .

[46]  Shichao Zhang,et al.  Noisy data elimination using mutual k-nearest neighbor for classification mining , 2012, J. Syst. Softw..

[47]  Yong Zhou,et al.  A kernel-assisted imputation estimating method for the additive hazards model with missing censoring indicator , 2015 .

[48]  John O. Odiyo,et al.  Filling of missing rainfall data in Luvuvhu River Catchment using artificial neural networks , 2011 .

[49]  Jos Twisk,et al.  Attrition in longitudinal studies. How to deal with missing data. , 2002, Journal of clinical epidemiology.

[50]  Esther-Lydia Silva-Ramírez,et al.  Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns , 2015, Appl. Soft Comput..

[51]  Vadlamani Ravi,et al.  A Novel Soft Computing Hybrid for Data Imputation , 2022 .

[52]  Yinhai Wang,et al.  A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation , 2015 .

[53]  Tariq Samad,et al.  Self–organization with partial data , 1992 .

[54]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[55]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[56]  Kishan G. Mehrotra,et al.  Elements of artificial neural networks , 1996 .

[57]  S. Nordbotten Neural network imputation applied to the Norwegian 1990 population census data , 1996 .

[58]  Davide Ballabio,et al.  The Kohonen and CP-ANN toolbox: A collection of MATLAB modules for Self Organizing Maps and Counterpropagation Artificial Neural Networks , 2009 .

[59]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[60]  Peter C. Austin,et al.  Bayesian modeling of missing data in clinical research , 2005, Comput. Stat. Data Anal..

[61]  M. P. Gómez-Carracedo,et al.  A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets , 2014 .

[62]  Alessandro G. Di Nuovo,et al.  Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario , 2011, Expert Syst. Appl..

[63]  Vadlamani Ravi,et al.  A new online data imputation method based on general regression auto associative neural network , 2014, Neurocomputing.

[64]  Marco Zaffalon,et al.  Bayesian network data imputation with application to survival tree analysis , 2016, Comput. Stat. Data Anal..

[65]  Vadlamani Ravi,et al.  Data imputation via evolutionary computation, clustering and a neural network , 2015, Neurocomputing.

[66]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[67]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[68]  Kimito Funatsu,et al.  QSAR study of anti-HIV HEPT analogues based on multi-objective genetic programming and counter-propagation neural network , 2006 .

[69]  Rex B. Kline,et al.  Principles and Practice of Structural Equation Modeling , 1998 .

[70]  Slobodan P. Simonovic,et al.  Estimation of missing streamflow data using principles of chaos theory , 2002 .

[71]  Qinbao Song,et al.  A new imputation method for small software project data sets , 2007, J. Syst. Softw..

[73]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[74]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[75]  David H. Schoellhamer,et al.  Singular spectrum analysis for time series with missing data , 2001 .

[76]  R. Hecht-Nielsen Counterpropagation networks. , 1987, Applied optics.

[77]  Deng Ju-Long,et al.  Control problems of grey systems , 1982 .

[78]  Stephen Henley,et al.  The problem of missing data in geoscience databases , 2006, Comput. Geosci..

[79]  Vadlamani Ravi,et al.  Evolving clustering based data imputation , 2014, 2014 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2014].

[80]  J. Zupan,et al.  Separation of data on the training and test set for modelling: a case study for modelling of five colour properties of a white pigment , 2003 .

[81]  Jure Zupan,et al.  Kohonen and counterpropagation artificial neural networks in analytical chemistry , 1997 .

[82]  Leonardo Franco,et al.  Missing data imputation in breast cancer prognosis , 2006 .

[83]  Benito E. Flores,et al.  A pragmatic view of accuracy measurement in forecasting , 1986 .

[84]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..