Characteristic sets and generalized maximal consistent blocks in mining incomplete data

Abstract Mining incomplete data using approximations based on characteristic sets is a well-established technique. It is applicable to incomplete data sets with a few interpretations of missing attribute values, e.g., lost values and “do not care” conditions. On the other hand, maximal consistent blocks were introduced for incomplete data sets with only “do not care” conditions, using only lower and upper approximations. In this paper we introduce an extension of the maximal consistent blocks to incomplete data sets with any interpretation of missing attribute values and with probabilistic approximations. We prove new results on probabilistic approximations based on generalized maximal consistent blocks. Additionally, we present results of experiments on mining incomplete data using both characteristic sets and maximal consistent blocks and using two interpretations of missing attribute values: lost values and “do not care” conditions. We show that there is some evidence that the best approach is using middle probabilistic approximations based on characteristic sets or on maximal consistent blocks.

[1]  Jerzy W. Grzymala-Busse,et al.  Rough Set Strategies to Data with Missing Attribute Values , 2006, Foundations and Novel Approaches in Data Mining.

[2]  Jerzy W. Grzymala-Busse,et al.  Three Approaches to Missing Attribute Values: A Rough Set Perspective , 2008, Data Mining: Foundations and Practice.

[3]  Jerzy W. Grzymala-Busse,et al.  Characteristic Sets and Generalized Maximal Consistent Blocks in Mining Incomplete Data , 2017, IJCRS.

[4]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[5]  Yiyu Yao,et al.  Probabilistic rough set approximations , 2008, Int. J. Approx. Reason..

[6]  Yee Leung,et al.  Maximal consistent block technique for rule acquisition in incomplete information systems , 2003, Inf. Sci..

[7]  Jerzy W. Grzymala-Busse,et al.  An Empirical Comparison of Rule Induction Using Feature Selection with the LEM2 Algorithm , 2012, IPMU.

[8]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[9]  Jerzy W. Grzymala-Busse,et al.  Definability in Mining Incomplete Data , 2016, KES.

[10]  Yee Leung,et al.  Knowledge acquisition in incomplete information systems: A rough set approach , 2006, Eur. J. Oper. Res..

[11]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[12]  Wojciech Ziarko,et al.  Probabilistic approach to rough sets , 2008, Int. J. Approx. Reason..

[13]  Yiyu Yao,et al.  A Decision Theoretic Framework for Approximating Concepts , 1992, Int. J. Man Mach. Stud..

[14]  Jerzy W. Grzymala-Busse,et al.  Experiments on probabilistic approximations , 2011, 2011 IEEE International Conference on Granular Computing.

[15]  Jerzy W. Grzymala-Busse,et al.  Generalized Parameterized Approximations , 2011, RSKT.

[16]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[17]  Jerzy W. Grzymala-Busse,et al.  Data mining based on rough sets , 2003 .

[18]  Dominik Slezak,et al.  The investigation of the Bayesian rough set model , 2005, Int. J. Approx. Reason..

[19]  Jerzy W. Grzymala-Busse,et al.  Local and Global Approximations for Incomplete Data , 2006, Trans. Rough Sets.

[20]  W Ziarko,et al.  INFER: an adaptative decision support system based on the probabilistic approximate classification , 1987 .

[21]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[22]  Jerzy W. Grzymala-Busse,et al.  Rule Induction using Probabilistic Approximations and Data with Missing Attribute Values , 2012 .