Performance Evaluation for Class Center-Based Missing Data Imputation Algorithm

The imputation method should be able to reproduce the actual values in the data or Predictive Accuracy (PAC) and maintaining the distribution of these values or Distributional Accuracy (DAC). However, in most studies, evaluation of imputation performance was measured based on classification accuracy. On classification issues, class center-based methods for missing data imputation are developed and outperform other methods for numeric and mixed data types. This paper will be evaluated the accuracy of class center-based methods for missing data imputation, which has been modified by considering the correlation between attributes. A class center-based method for missing data imputation produces an average value of r is 0.96, with the lowest average value for MSE and DKS is 0.04 and 0.03. This result shows that the imputation method is more efficient and can maintain the actual data value distribution.

[1]  Muhamad Saiful Bahri Yusoff,et al.  Missing values in data analysis: Ignore or Impute? , 2011 .

[2]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[3]  Alan Wee-Chung Liew,et al.  Missing value imputation for the analysis of incomplete traffic accident data , 2014, Inf. Sci..

[4]  Julian Williams,et al.  Handling missing data: analysis of a challenging data set using multiple imputation , 2016 .

[5]  Gerhard Tutz,et al.  Improved methods for the imputation of missing data by nearest neighbor methods , 2015, Comput. Stat. Data Anal..

[6]  Lukasz A. Kurgan,et al.  Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..

[7]  Lorenzo Beretta,et al.  Nearest neighbor imputation algorithms: a critical evaluation , 2016, BMC Medical Informatics and Decision Making.

[8]  Lynne E. Parker,et al.  Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks , 2014, Inf. Fusion.

[9]  Azlan Mohd Zain,et al.  A Review On Missing Value Estimation Using Imputation Algorithm , 2017 .

[10]  Kridanto Surendro,et al.  Missing Data Problem in Predictive Analytics , 2019, ICSCA.

[11]  Miriam Seoane Santos,et al.  Influence of Data Distribution in Missing Data Imputation , 2017, AIME.

[12]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[13]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[14]  Jacky W. Keung,et al.  Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study , 2017, J. Syst. Softw..

[15]  Ms.R Malarvizhi K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imputation , 2012 .

[16]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[17]  Esther-Lydia Silva-Ramírez,et al.  Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns , 2015, Appl. Soft Comput..

[18]  Chih-Fong Tsai,et al.  The distance function effect on k-nearest neighbor classification for medical datasets , 2016, SpringerPlus.

[19]  Min Gan,et al.  Information-decomposition-model-based missing value estimation for not missing at random dataset , 2015, International Journal of Machine Learning and Cybernetics.

[20]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[21]  Rubiyah Yusof,et al.  FINNIM: Iterative Imputation of Missing Values in Dissolved Gas Analysis Dataset , 2014, IEEE Transactions on Industrial Informatics.

[22]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[23]  Miriam Seoane Santos,et al.  Exploring the Effects of Data Distribution in Missing Data Imputation , 2018, IDA.

[24]  Zhuo Yang,et al.  Influence analysis of Github repositories , 2016, SpringerPlus.

[25]  Xuelong Li,et al.  Learning k for kNN Classification , 2017, ACM Trans. Intell. Syst. Technol..

[26]  Chowdhury Farhan Ahmed,et al.  An effective method for classification with missing values , 2018, Applied Intelligence.

[27]  Taghi M. Khoshgoftaar,et al.  Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[28]  Benjamin M. Marlin,et al.  Missing Data Problems in Machine Learning , 2008 .

[29]  Rajesh Jugulum,et al.  Importance of Data Quality for Analytics , 2016 .

[30]  Robin Singh Bhadoria,et al.  Predictive analytics in data science for business intelligence solutions , 2017, 2017 7th International Conference on Communication Systems and Network Technologies (CSNT).

[31]  Tra My Pham,et al.  Missing data and multiple imputation in clinical epidemiological research , 2017, Clinical epidemiology.

[32]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .