A genetic algorithm for discretization of decision systems

Discretization of attributes with real values is an important problem in data mining based on rough set. The discretization based on rough set has some particular characteristics. Consistency need to be satisfied and cuts set for discretization is expected to be as small as possible. Consistent and minimal discretization problem is NP-complete. A genetic algorithm for consistent and minimal discretization of decision system is proposed. In the genetic algorithm, chromosome is represented as a binary string, whose length is the same as the number of the candidate cuts. The fitness function is designed elaborately and two weight factors are introduced into the definition of fitness function to handle the consistency and minimum. Experiments show that the algorithm is better than the greedy method and the famous ChiMerge method. The algorithm can solve the consistent and minimal discretization of decision system preferably.

[1]  Hung Son Nguyen,et al.  Discretization Problem for Rough Sets Methods , 1998, Rough Sets and Current Trends in Computing.

[2]  Wojciech Ziarko,et al.  INTRODUCTION TO THE SPECIAL ISSUE ON ROUGH SETS AND KNOWLEDGE DISCOVERY , 1995, Comput. Intell..

[3]  Yuan-Xiang Li,et al.  Heuristic genetic algorithm for minimal reduction decision system based on rough set theory , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[4]  Yuan-Xiang Li,et al.  Study on discretization based on rough set theory , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.