Mining incomplete data with lost values and attribute-concept values

This paper presents novel research on an experimental comparison of two interpretations of missing attribute values: lost values and attribute-concept values. Experiments were conducted on 176 data sets, with preprocessing using three kinds of probabilistic approximations (lower, middle and upper) and then the MLEM2 rule induction system. The performance was evaluated using the error rate computed by ten-fold cross validation. Our main objective was to check which interpretation of the two missing attribute values is better in terms of the error rate. In our experiments, the better performance, in 10 out of 24 cases, is accomplished using lost values. In remaining 14 cases the difference in performance is not statistically significant (5% significance level).

[1]  Jerzy W. Grzymala-Busse,et al.  Three Approaches to Missing Attribute Values: A Rough Set Perspective , 2008, Data Mining: Foundations and Practice.

[2]  Yiyu Yao,et al.  Probabilistic rough set approximations , 2008, Int. J. Approx. Reason..

[3]  Jerzy W. Grzymala-Busse,et al.  Experiments on rule induction from incomplete data using three probabilistic approximations , 2012, 2012 IEEE International Conference on Granular Computing.

[4]  Alexis Tsoukiàs,et al.  Incomplete Information Tables and Rough Classification , 2001, Comput. Intell..

[5]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[6]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[7]  Jerzy W. Grzymala-Busse,et al.  Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction , 2004, Trans. Rough Sets.

[8]  Jerzy W. Grzymala-Busse,et al.  Generalized Parameterized Approximations , 2011, RSKT.

[9]  W Ziarko,et al.  INFER: an adaptative decision support system based on the probabilistic approximate classification , 1987 .

[10]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[11]  Jerzy W. Grzymala-Busse,et al.  Data mining based on rough sets , 2003 .

[12]  Dominik Slezak,et al.  The investigation of the Bayesian rough set model , 2005, Int. J. Approx. Reason..

[13]  Jerzy W. Grzymala-Busse,et al.  Definability and Other Properties of Approximations for Generalized Indiscernibility Relations , 2010, Trans. Rough Sets.

[14]  Jerzy W. Grzymala-Busse,et al.  An Experimental Comparison of Three Interpretations of Missing Attribute Values Using Probabilistic Approximations , 2013, RSFDGrC.

[15]  Jerzy W. Grzymala-Busse,et al.  Rule Induction using Probabilistic Approximations and Data with Missing Attribute Values , 2012 .

[16]  Jerzy W. Grzymala-Busse,et al.  Experiments on probabilistic approximations , 2011, 2011 IEEE International Conference on Granular Computing.

[17]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[18]  Patrick G. Clark,et al.  Mining Incomplete Data with Many Missing Attribute Values A Comparison of Probabilistic and Rough Set Approaches , 2013, ICIS 2013.

[19]  Jerzy W. Grzymala-Busse,et al.  A New Version of the Rule Induction System LERS , 1997, Fundam. Informaticae.

[20]  Yiyu Yao,et al.  A Decision Theoretic Framework for Approximating Concepts , 1992, Int. J. Man Mach. Stud..

[21]  Jerzy W. Grzymala-Busse,et al.  Rough Set Strategies to Data with Missing Attribute Values , 2006, Foundations and Novel Approaches in Data Mining.

[22]  Wojciech Ziarko,et al.  Probabilistic approach to rough sets , 2008, Int. J. Approx. Reason..

[23]  Masahiro Inuiguchi,et al.  Variable Precision Rough Set Model in Information Tables with Missing Values , 2011, J. Adv. Comput. Intell. Intell. Informatics.