Application of the Modified Imputation Method to Missing Data to Increase Classification Performance

Incomplete data or missing data diminishes the effectivity of statistical results, and may cause bias estimates, which in turn leads to unsound judgment. Inefficiency and impediments in data treatment analysis, which are among the predicaments linked with missing values, may affect the supervised learning process and reduce the classification accuracy and performance of the prediction model in a data mining task. This study applied the modified imputation method–which was previously tested with well-known imputation algorithms–to renowned classification techniques namely Naive Bayes, One-R, k-Nearest Neighbor (kNN), C4.5, and Support Vector Machine (SVM) using open data sets from the UCI Repository. The level of performance in terms of precision, accuracy, and Receiver Operating Characteristics (ROC) using Weka tool, before and after imputation was examined. This study manifests that there was an improvement in the classification performance upon the application of the modified imputation method on datasets during preprocessing, compared to that of datasets with missing values.

[1]  Fábio M. F. Lobato,et al.  An Evolutionary Missing Data Imputation Method for Pattern Classification , 2015, GECCO.

[2]  Peng Liu,et al.  A Quantitative Study of the Effect of Missing Data in Classifiers , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[3]  Ariel M. Sison,et al.  A Modified Imputation Method to Missing Data as a Preprocessing Technique , 2018, 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM).

[4]  Kun Chang Lee,et al.  Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets , 2016, Expert Syst. Appl..

[5]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[6]  Dr. Antony selvadoss Thanamani,et al.  Elevating the Accuracy of Missing Data Imputation Using Bolzano Classifier , 2016 .

[7]  M. Vihinen How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis , 2012, BMC Genomics.

[8]  Ohbyung Kwon,et al.  Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications , 2015 .

[9]  Luis E. Zárate,et al.  Comparison of Classifiers Efficiency on Missing Values Recovering: Application in a Marketing Database with Massive Missing Data , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[10]  Hocine Cherifi,et al.  Evaluation of Performance Measures for Classifiers Comparison , 2011, UbiComp 2011.

[11]  Michel C. Desmarais,et al.  Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data , 2017, Appl. Artif. Intell..

[12]  Jun Zhang,et al.  Geospatial Object Detection in Remote Sensing Imagery Based on Multiscale Single-Shot Detector with Activated Semantics , 2018, Remote. Sens..

[13]  S. Sukumaran,et al.  A study on classification techniques in data mining , 2013, 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).

[14]  S. Archana,et al.  Survey of Classification Techniques in Data Mining , 2014 .

[15]  Tina R. Patil,et al.  Performance Analysis of Naive Bayes and J 48 Classification Algorithm for Data Classification , 2013 .

[16]  Soujanya Maddina Intelligent Based Imputation Methods for Text Mining Applications to Phishing Attacks , 2015 .

[17]  Claudomiro Sales,et al.  Multi-objective genetic algorithm for missing data imputation , 2015, Pattern Recognit. Lett..

[18]  Hyun Kang The prevention and handling of the missing data , 2013, Korean journal of anesthesiology.