Rough sets approach to symbolic value partition

In data mining, searching for simple representations of knowledge is a very important issue. Attribute reduction, continuous attribute discretization and symbolic value partition are three preprocessing techniques which are used in this regard. This paper investigates the symbolic value partition technique, which divides each attribute domain of a data table into a family for disjoint subsets, and constructs a new data table with fewer attributes and smaller attribute domains. Specifically, we investigates the optimal symbolic value partition (OSVP) problem of supervised data, where the optimal metric is defined by the cardinality sum of new attribute domains. We propose the concept of partition reducts for this problem. An optimal partition reduct is the solution to the OSVP-problem. We develop a greedy algorithm to search for a suboptimal partition reduct, and analyze major properties of the proposed algorithm. Empirical studies on various datasets from the UCI library show that our algorithm effectively reduces the size of attribute domains. Furthermore, it assists in computing smaller rule sets with better coverage compared with the attribute reduction approach.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Xiaoying Wang The Post-Communist Personality: The Spectre of China's Capitalist Market Reforms , 2002, The China Journal.

[3]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[4]  Fan Min,et al.  The M-Relative Reduct Problem , 2006, RSKT.

[5]  Zdzislaw Pawlak,et al.  Some Issues on Rough Sets , 2004, Trans. Rough Sets.

[6]  Paul D. Scott,et al.  Reducing decision tree fragmentation through attribute value grouping: A comparative study , 2000, Intell. Data Anal..

[7]  Marzena Kryszkiewicz Comparative study of alternative types of knowledge reduction in inconsistent systems , 2001, Int. J. Intell. Syst..

[8]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[9]  Fan Min,et al.  Reduction Based Symbolic Value Partition , 2006, ICHIT.

[10]  Yiyu Yao,et al.  On Reduct Construction Algorithms , 2006, Trans. Comput. Sci..

[11]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[12]  Sinh Hoa Nguyen,et al.  Regularity analysis and its applications in data mining , 2000 .

[13]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yiyu Yao,et al.  A Partition Model of Granular Computing , 2004, Trans. Rough Sets.

[15]  Fan Min,et al.  Knowledge Reduction in Inconsistent Decision Tables , 2006, ADMA.

[16]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[17]  Marzena Kryszkiewicz Comparative study of alternative types of knowledge reduction in inconsistent systems , 2001, Int. J. Intell. Syst..

[18]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[20]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[23]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Andrzej Skowron,et al.  Dynamic Reducts as a Tool for Extracting Laws from Decisions Tables , 1994, ISMIS.

[25]  Fan Min,et al.  Weighted Reduction for Decision Tables , 2006, FSKD.