A global optimal algorithm for class-dependent discretization of continuous data

This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and the variable to be discretized as the objective function, and then use fractional programming (iterative dynamic programming) to find its optimum. Unlike the majority of class-dependent discretization methods in the literature which only find the local optimum of the objective functions, the proposed method, OCDD, or Optimal Class-Dependent Discretization, finds the global optimum. The experimental results demonstrate that this algorithm is very effective in classification when coupled with popular learning systems such as C4.5 decision trees and Naive-Bayes classifier. It can be used to discretize continuous variables for many existing inductive learning systems.

[1]  Paul D. Scott,et al.  Zeta: A Global Method for Discretization of Continuous Variables , 1997, KDD.

[2]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[3]  Stephen D. Bay Multivariate Discretization for Set Mining , 2001, Knowledge and Information Systems.

[4]  Bernhard Pfahringer,et al.  Compression-Based Discretization of Continuous Attributes , 1995, ICML.

[5]  Lukasz Kurgan,et al.  Discretization Algorithm that Uses Class-Attribute Interdependence Maximization , 2003 .

[6]  Marco Richeldi,et al.  Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract) , 1995, ECML.

[7]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[8]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[9]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[10]  Andrew K. C. Wong,et al.  DECA: A Discrete-Valued Data Clustering Algorithm , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[13]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[14]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[15]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[16]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[17]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[18]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew C. Wong,et al.  Classification of discrete data with feature space transformation , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[21]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[24]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.

[25]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[26]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[27]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[28]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[29]  Mashe Sniedovich,et al.  Dynamic Programming , 1991 .