Missing Values Estimation on Multivariate Dataset : Comparison of Three Type Methods Approach

Knowledge discovery has become ever more essential in the digital era. The very first step of knowledge discovery is data acquisition. It can be gathered from all across the field and automatically made the data type is also various. The result of that data acquisition process is an arguably-huge dataset. Observed data can be achieved by several method, such as censor record or by frequent observation. Each of the analysis process consists of several steps, one of them is preprocessing. Preprocessing is a step or phase to identify, selection, or problem handling of the data. Missing values handling are included in the preprocessing step. The purpose of this research is to find out which type of approach of missing values handling work better on this type of dataset. This research uses three approaches and been compared to each other such as Mode Imputation, Decision tree, and Class Center based Missing Values Imputation. To perform a fair comparison among them, several scenarios of missing values appearance have been made. Dataset scenarios for this study are actually artificially "deleted" to be able to measure the performance of the methods. From the evaluation process, Decision Tree method shows a consistency even on different missing point’s amount. Numerically have a slightly lower that the other methods.

[1]  Md Zahidul Islam,et al.  Missing value imputation using a fuzzy clustering-based EM approach , 2015, Knowledge and Information Systems.

[2]  Chenxi Shao,et al.  An interpolation method combining Snurbs with window interpolation adjustment , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[3]  Shie-Jue Lee,et al.  Time series forecasting with missing values , 2015, 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom).

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  C. Parthiban,et al.  Imputation for the analysis of missing values and prediction of time series data , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[6]  Shancang Li,et al.  MIAEC: Missing Data Imputation Based on the Evidence Chain , 2018, IEEE Access.

[7]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[8]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[9]  Michele Modugno,et al.  Maximum Likelihood Estimation of Factor Models on Data Sets with Arbitrary Pattern of Missing Data , 2010, SSRN Electronic Journal.

[10]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[11]  Chih-Fong Tsai,et al.  Combining instance selection for better missing value imputation , 2016, J. Syst. Softw..

[12]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[13]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[14]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[15]  Sachin Gavankar,et al.  Decision Tree: Review of Techniques for Missing Values at Training, Testing and Compatibility , 2015, 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS).

[16]  Shouhong Wang,et al.  Mining incomplete survey data through classification , 2010, Knowledge and Information Systems.

[17]  Cem Iyigun,et al.  Comparison of missing value imputation methods in time series: the case of Turkish meteorological data , 2013, Theoretical and Applied Climatology.