A Bayesian Discretizer for Real-Valued Attributes

Discretization of real-valued attributes into nominal intervals has been an important area for symbolic induction systems because many real world classification tasks involve both symbolic and numerical attributes. Among various supervised and unsupervised discretization methods, the information gain-based methods have been widely used and cited. This paper designs a new discretization method, called the Bayesian discretizer, and compares its performance with the information gain methods implemented in C4.5 and HCV (Version 2.0). Over the seven datasets tested, the Bayesian discretizer has the best results of four of them in terms of predictive accuracy.