A Global Discretization and Attribute Reduction Algorithm Based on K-Means Clustering and Rough Sets Theory

The knowledge reduction function of rough sets theory is specific on discrete data, while most attributes of decision tables are continuous. Therefore a global discretization and attribute reduction algorithm is proposed based on clustering and rough sets theory. After comparing different discretization methods, the k-means clustering algorithm is used. In order to avoid the shortcomings of k-means clustering algorithm, the F-analysis of variance statistics and support strength of condition attributes are introduced to control the discretization effectiveness. A rational clustering number is derived according to the dependency index to meet the prerequisite of the rough set theory. After that, the attributes are reduced by using rough set theory, and decision rules are induced. Lastly an example is proposed to illustrate the feasibility and effectiveness of the algorithm.