Hybrid model for data imputation: Using fuzzy c means and multi layer perceptron

Database store datasets that are not always complete. They contain missing fields inside some records, that may occur due to human or system error involved in a data collection task. Data imputation is the process of filling in the missing value to generate complete records. Complete databases can be analyzed more accurately in comparison to incomplete databases. This paper proposes a 2-stage hybrid model for filling in the missing values using fuzzy c-means clustering and multilayer perceptron (MLP) working in sequence and compares it with k -means imputation and fuzzy c -means (FCM) imputation. The accuracy of the model is checked using Mean Absolute Percentage Error (MAPE). The MAPE value obtained shows that the proposed model is more accurate in filling multiple values in a record compared to stage 1 alone.

[1]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[2]  Vadlamani Ravi,et al.  Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts , 2012, Expert Syst. Appl..

[3]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[4]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[5]  M. Marseguerra,et al.  The AutoAssociative Neural Network in signal analysis: II. Application to on-line monitoring of a simulated BWR component , 2005 .

[6]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[7]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[8]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[9]  Jitender S. Deogun,et al.  Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories , 2005, Trans. Rough Sets.

[10]  Monique Frize,et al.  Imputation of missing values by integrating neural networks and case-based reasoning , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Amit Gupta,et al.  Estimating Missing Values Using Neural Networks , 1996 .

[12]  Bogdan Gabrys,et al.  Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems , 2002, Int. J. Approx. Reason..

[13]  S. Nordbotten Neural network imputation applied to the Norwegian 1990 population census data , 1996 .

[14]  Tshilidzi Marwala,et al.  Missing data: A comparison of neural network and expectation maximization techniques , 2007 .

[15]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[16]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..