A Comparison of Several Approaches to Missing Attribute Values in Data Mining

In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods to deal with missing attribute values. For testing both naive classification and new classification techniques of LERS (Learning from Examples based on Rough Sets) were used. The quality criterion was the average error rate achieved by ten-fold cross-validation. Using the Wilcoxon matched-pairs signed rank test, we conclude that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches; the most common attribute-value method is the worst method among all nine approaches; while some methods do not differ from other methods significantly. The method of assigning to the missing attribute value all possible values of the attribute and the method of assigning to the missing attribute value all possible values of the attribute restricted to the same concept are excellent approaches based on our limited experimental results. However we do not have enough evidence to support the claim that these approaches are superior.

[1]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[4]  D.E. Goldberg,et al.  Classifier Systems and Genetic Algorithms , 1989, Artif. Intell..

[5]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[6]  Lech Polkowski,et al.  Rough Sets in Knowledge Discovery 2 , 1998 .

[7]  Jerzy Stefanowski,et al.  On rough set based approaches to induction of decision rules , 1998 .

[8]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Ivan Bratko,et al.  Experiments in automatic learning of medical diagnostic rules , 1984 .

[11]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[12]  Ryszard S. Michalski,et al.  The AQ15 Inductive Learning System: An Overview and Experiments , 1986 .

[13]  Jerzy W. Grzymala-Busse,et al.  On the Unknown Attribute Values in Learning from Examples , 1991, ISMIS.

[14]  Andrew K. C. Wong,et al.  Synthesizing Knowledge: A Cluster Analysis Approach Using Event Covering , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Charles A. Nickerson,et al.  Statistical analysis for decision making , 1978 .