Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a pre-processing step to the induction of a classifier is introduced in order to obtain an optimal multi-interval splitting for each numeric attribute. A transparent description of the method and the steps involved in cost-based discretization are given. The aim of this paper is to present this method and to assess the potential benefits of such an approach. Furthermore, its performance against two other well-known methods, i.e. entropy-and pure error-based discretization is examined. To this end, experiments on 14 data sets, taken from the UCI Repository on Machine Learning were carried out. In order to compare the different methods, the area under the Receiver Operating Characteristic (ROC) graph was used and tested on its level of significance. For most data sets the results show that cost-based discretization achieves satisfactory results when compared to entropy-and error-based discretization.Given its importance, many researchers have already contributed to the issue of discretization in the past. However, to the best of our knowledge, no efforts have been made yet to include the concept of misclassification costs to find an optimal multi-split for discretization purposes, prior to induction of the decision tree. For this reason, this new concept is introduced and explored in this article by means of operations research techniques.

[1]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[2]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[3]  Jukka Hekanaho DOGMA: A GA-Based Relational Learner , 1998, ILP.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Erick Cantú-Paz,et al.  Supervised and unsupervised discretization methods for evolutionary algorithms , 2001 .

[6]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[7]  Tapio Elomaa,et al.  General and Efficient Multisplitting of Numerical Attributes , 1999, Machine Learning.

[8]  Tapio Elomaa,et al.  Finding Optimal Multi-Splits for Numerical Attributes in Decision Tree Learning , 1996 .

[9]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[10]  Simon Kasif,et al.  Efficient Algorithms for Finding Multi-way Splits for Decision Trees , 1995, ICML.

[11]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[12]  Wolfgang Maass,et al.  Efficient agnostic PAC-learning with simple hypothesis , 1994, COLT '94.

[13]  Stephen D. Bay Multivariate Discretization for Set Mining , 2001, Knowledge and Information Systems.

[14]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[15]  Changhwan Lee,et al.  A Context-Sensitive Discretization of Numeric Attributes for Classification Learning , 1994, ECAI.

[16]  Tapio Elomaa,et al.  Preprocessing opportunities in optimal numerical range partitioning , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[18]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[19]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[20]  Bernhard Pfahringer,et al.  Compression-Based Discretization of Continuous Attributes , 1995, ICML.

[21]  Pat Langley,et al.  Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), June 29- July 2, 2000, Stanford University , 2000 .

[22]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[23]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[24]  Tapio Elomaa,et al.  Fast Minimum Training Error Discretization , 2002, ICML.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[27]  Kai Ming Ting,et al.  Inducing Cost-Sensitive Trees via Instance Weighting , 1998, PKDD.

[28]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[29]  Luc De Raedt,et al.  Lookahead and Discretization in ILP , 1997, ILP.

[30]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[31]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[32]  Frederick S. Hillier,et al.  Introduction of Operations Research , 1967 .

[33]  Tapio Elomaa,et al.  Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates , 2004, Data Mining and Knowledge Discovery.

[34]  Tapio Elomaa,et al.  Generalizing Boundary Points , 2000, AAAI/IAAI.

[35]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[36]  Thierry Van de Merckt Decision Trees in Numerical Attribute Spaces , 1993, IJCAI.

[37]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[38]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[39]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[40]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..