A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

Abstract ² All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and the y also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real -time. The paper proposes a computational intelligence based architecture for online da ta imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation techn ique has 2 stages. In stage 1, E volving Clustering M ethod (ECM) is used to replace the missing values with cluster centers, as pa rt of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique . The offline imputation techniques employ K -Means or K -Medoids and Multi Layer Perceptron (MLP) or GRNN in Stage -1 and Stage -2 respectively. Several experiments were conducted on 8 benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imp utation techniques. In terms of Mean A bsolute P ercentage E rror (MAPE) , the results indicate that the difference between the proposed best offline imputation method viz., K -Medoids+GRNN and the proposed online imputati on method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, be ing less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation tech niques. This is the significant outcome of the study. Further more , GRNN in stage -2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets. Keywords ² Data Imputation, General R egression Neural Network (GRNN), Evolving C lustering M ethod (ECM), Imputation, K -Medoids clustering, K -Means clustering, MLP

[1]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[2]  Slobodan P. Simonovic,et al.  Estimation of missing streamflow data using principles of chaos theory , 2002 .

[3]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[4]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[5]  Stephen Henley,et al.  The problem of missing data in geoscience databases , 2006, Comput. Geosci..

[6]  Leonardo Franco,et al.  Missing data imputation in breast cancer prognosis , 2006 .

[7]  Peter C. Austin,et al.  Bayesian modeling of missing data in clinical research , 2005, Comput. Stat. Data Anal..

[8]  Paul E. Green,et al.  AN ALTERNATING LEAST‐SQUARES PROCEDURE FOR ESTIMATING MISSING PREFERENCE DATA IN PRODUCT‐CONCEPT TESTING* , 1986 .

[9]  Qinbao Song,et al.  A new imputation method for small software project data sets , 2007, J. Syst. Softw..

[10]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[11]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[12]  Alessandro G. Di Nuovo,et al.  Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario , 2011, Expert Syst. Appl..

[13]  Gustavo E. A. P. A. Batista,et al.  Experimental comparison pf K-NEAREST NEIGHBOUR and MEAN OR MODE imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data , 2003 .

[14]  Tariq Samad,et al.  Self–organization with partial data , 1992 .

[15]  John O. Odiyo,et al.  Filling of missing rainfall data in Luvuvhu River Catchment using artificial neural networks , 2011 .

[16]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[17]  Ingram Olkin,et al.  Incomplete data in sample surveys. Vol. 1: report and case studies , 1983 .

[18]  Nikola K. Kasabov,et al.  DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction , 2002, IEEE Trans. Fuzzy Syst..

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  N M Laird,et al.  Missing data in longitudinal studies. , 1988, Statistics in medicine.

[21]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[22]  SongQinbao,et al.  A new imputation method for small software project data sets , 2007 .

[23]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[24]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[25]  Ito Wasito,et al.  Nearest neighbours in least-squares data imputation algorithms with different missing patterns , 2006, Comput. Stat. Data Anal..

[26]  Amit Gupta,et al.  Estimating Missing Values Using Neural Networks , 1996 .

[27]  T. Marwala,et al.  Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm , 2006 .

[28]  Soo-Young Lee,et al.  Training Algorithm with Incomplete Data for Feed-Forward Neural Networks , 1999, Neural Processing Letters.

[29]  Shichao Zhang,et al.  Noisy data elimination using mutual k-nearest neighbor for classification mining , 2012, J. Syst. Softw..

[30]  Vadlamani Ravi,et al.  A Novel Soft Computing Hybrid for Data Imputation , 2022 .

[31]  Ingram Olkin,et al.  Incomplete data in sample surveys , 1985 .

[32]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[33]  Amaury Lendasse,et al.  X-SOM and L-SOM: A double classification approach for missing value imputation , 2010, Neurocomputing.

[34]  M. Marseguerra,et al.  The AutoAssociative Neural Network in signal analysis: II. Application to on-line monitoring of a simulated BWR component , 2005 .

[35]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[36]  P. Roth,et al.  Missing Data in Multiple Item Scales: A Monte Carlo Analysis of Missing Data Techniques , 1999 .

[37]  Bogdan Gabrys,et al.  Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems , 2002, Int. J. Approx. Reason..

[38]  S. Nordbotten Neural network imputation applied to the Norwegian 1990 population census data , 1996 .

[39]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[40]  Ali Mili,et al.  Modeling the evolution of operating systems: An empirical study , 2007, J. Syst. Softw..

[41]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[42]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[43]  Fengzhan Tian,et al.  A selective Bayes Classifier for classifying incomplete data based on gain ratio , 2008, Knowl. Based Syst..

[44]  Ito Wasito,et al.  Nearest neighbour approach in the least-squares data imputation algorithms , 2005, Inf. Sci..