Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns

Graphical abstractDisplay Omitted HighlightsImputation data for monotone patterns of missing values.An estimation model of missing data based on multilayer perceptron.Combination of neural network and k-nearest neighbour-based multiple imputation.Comparison of the performance of proposed models with three classic procedures.Three classic single imputation models: mean/mode, regression and hot-deck. The knowledge discovery process is supported by data files information gathered from collected data sets, which often contain errors in the form of missing values. Data imputation is the activity aimed at estimating values for missing data items. This study focuses on the development of automated data imputation models, based on artificial neural networks for monotone patterns of missing values. The present work proposes a single imputation approach relying on a multilayer perceptron whose training is conducted with different learning rules, and a multiple imputation approach based on the combination of multilayer perceptron and k-nearest neighbours. Eighteen real and simulated databases were exposed to a perturbation experiment with random generation of monotone missing data pattern. An empirical test was accomplished on these data sets, including both approaches (single and multiple imputations), and three classical single imputation procedures - mean/mode imputation, regression and hot-deck - were also considered. Therefore, the experiments involved five imputation methods. The results, considering different performance measures, demonstrated that, in comparison with traditional tools, both proposals improve the automation level and data quality offering a satisfactory performance.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[3]  Peder Hjorth,et al.  Imputation of missing values in a precipitation–runoff process database , 2009 .

[4]  G. H. Raisoni,et al.  Ijca Special Issue on " Evolutionary Computation for Optimization Techniques " Ecot, 2010 Multiple Imputation of Missing Data with Genetic Algorithm Based Techniques , 2022 .

[5]  Jeffrey S. Simonoff,et al.  An Investigation of Missing Data Methods for Classification Trees , 2006, J. Mach. Learn. Res..

[6]  A. Goicoechea IMPUTACIÓN BASADA EN ÁRBOLES DE CLASIFICACIÓN , 2002 .

[7]  Swati Aggarwal,et al.  Hybrid model for data imputation: Using fuzzy c means and multi layer perceptron , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[8]  Esther-Lydia Silva-Ramírez,et al.  Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.

[9]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[10]  Aníbal R. Figueiras-Vidal,et al.  Classifying patterns with missing values using Multi-Task Learning perceptrons , 2013, Expert Syst. Appl..

[11]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[12]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  H. Stern,et al.  The use of multiple imputation for the analysis of missing data. , 2001, Psychological methods.

[16]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[17]  Shouhong Wang,et al.  Application of self-organising maps for data mining with incomplete data sets , 2003, Neural Computing & Applications.

[18]  Peter L. Hammer,et al.  A new imputation method for incomplete binary data , 2011, ISAIM.

[19]  Howard B. Demuth,et al.  Neutral network toolbox for use with Matlab , 1995 .

[20]  Ahmet Arslan,et al.  A NOVEL HYBRID APPROACH TO ESTIMATING MISSING VALUES IN DATABASES USING K-NEAREST NEIGHBORS AND NEURAL NETWORKS , 2012 .

[21]  Fernando Tusell,et al.  Neural networks and predictive matching for flexible imputation , 2002 .

[22]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[23]  Robert J. Kuligowski,et al.  USING ARTIFICIAL NEURAL NETWORKS TO ESTIMATE MISSING RAINFALL DATA 1 , 1998 .