Data Mining: A Probabilistic Rough Set Approach

This paper introduces a new approach for mining if-then rules in databases with uncertainty and incompleteness. The approach is based on the combination of Generalization Distribution Table (GDT) and the Rough Set methodology. A GDT is a table in which the probabilistic relationships between concepts and instances over discrete domains are represented. By using a GDT as a hypothesis search space and combining the GDT with the rough set methodology, noises and unseen instances can be handled, biases can be flexibly selected, background knowledge can be used to constrain rule generation, and if-then rules with strengths can be effectively acquired from large, complex databases in an incremental, bottom-up mode. In this paper, we focus on basic concepts and an implementation of our methodology.

[1]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[2]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[3]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[4]  Thomas G. Dietterich,et al.  Readings in Machine Learning , 1991 .

[5]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[6]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[7]  Jacques Teghem,et al.  Use of "Rough Sets" Method to Draw Premonitory Factors for Earthquakes by Emphasing Gas Geochemistry: The Case of a Low Seismic Activity Context, in Belgium , 1992, Intelligent Decision Support.

[8]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[9]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[10]  Bernhard Pfahringer,et al.  Compression-Based Discretization of Continuous Attributes , 1995, ICML.

[11]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[12]  Setsuo Ohsuga,et al.  Symbol Processing by Non-Symbol Processor , 1996, PRICAI.

[13]  Tsau Young Lin,et al.  Rough Sets and Data Mining: Analysis of Imprecise Data , 1996 .

[14]  Andrzej Skowron,et al.  A Rough Set Framework for Data Mining of Propositional Default Rules , 1996, ISMIS.

[15]  Andrzej Skowron,et al.  Synthesis of Decision Systems from Data Tables , 1997 .

[16]  T. Y. Lin,et al.  Rough Sets and Data Mining , 1997, Springer US.