A Comparative Study of Imputation Methods to Predict Missing Attribute Values in Coronary Heart Disease Data Set

The objective of this research is to investigate the effects of missing attribute value imputation methods on the quality of extracted rules when rule filtering is applied. Three imputation methods: Artificial Neural Network with Rough Set Theory (ANNRST), k-Nearest Neighbor (k-NN) and Concept Most Common Attribute Value Filling (CMCF) are applied to University California Irvine (UCI) coronary heart disease data sets. Rough Set Theory (RST) method is used to generate the rules from the three imputed data sets. Support filtering is used to select the rules. Accuracy, coverage, sensitivity, specificity and Area Under Curve (AUC) of Receiver Operating Characteristics (ROC) analysis are used to evaluate the performance of the rules when they are applied to classify the complete testing data set. Evaluation results show that ANNRST is considered as the best method among k-NN and CMCF.

[1]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[2]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[3]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[4]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[5]  Mohannad Najjar,et al.  A framework to Deal with Missing Data in Data Sets , 2006 .

[6]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[7]  Ito Wasito,et al.  Nearest neighbour approach in the least-squares data imputation algorithms , 2005, Inf. Sci..

[8]  Ahmad Fadzil M. Hani,et al.  Missing Attribute Value Prediction Based on Artificial Neural Network and Rough Set Theory , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[9]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[10]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[11]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[12]  Aleksander Øhrn,et al.  Discernibility and Rough Sets in Medicine: Tools and Applications , 2000 .

[13]  Hui-Chuan Chen,et al.  Estimating missing data of wind speeds using neural network , 2002, Proceedings IEEE SoutheastCon 2002 (Cat. No.02CH37283).

[14]  N.A. Setiawan,et al.  International Conference on Intelligent and Advanced Systems 2007 Missing Data Estimation on Heart Disease Using Artificial Neural Network and Rough Set Theory , 2008 .

[15]  Jiye Li,et al.  Assigning missing attribute values based on rough sets theory , 2006, 2006 IEEE International Conference on Granular Computing.

[16]  Ito Wasito,et al.  Nearest neighbours in least-squares data imputation algorithms with different missing patterns , 2006, Comput. Stat. Data Anal..