Discretization Techniques: A recent survey

A discretization algorithm is needed in order to handle problems with real-valued attributes with Decision Trees (DTs), Bayesian Networks (BNs) and Rule-Learners (RLs), treating the resulting intervals as nominal val- ues. The performance of these systems is tied to the right election of these in- tervals. A good discretization algorithm has to balance the loss of information intrinsic to this kind of process and generating a reasonable number of cut points, that is, a reasonable search space. This paper presents the well known discretization techniques. Of course, a single article cannot be a complete re- view of all discretization algorithms. Despite this, we hope that the references cited cover the major theoretical issues and guide the researcher to interesting research directions and suggest possible combinations that have to be explored.

[1]  Caroline Chan,et al.  Determination of quantization intervals in rule based model for dynamic systems , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[2]  Roland Sauerbrey,et al.  Biography , 1992, Ann. Pure Appl. Log..

[3]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[4]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[5]  Tony R. Martinez,et al.  Extending ID 3 Through Discretization of Continuous Inputs , 1994 .

[6]  Wolfgang Maass,et al.  Efficient agnostic PAC-learning with simple hypothesis , 1994, COLT '94.

[7]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[9]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[10]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[11]  Ramón López de Mántaras,et al.  Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method , 1997, KDD.

[12]  Paul D. Scott,et al.  Zeta: A Global Method for Discretization of Continuous Variables , 1997, KDD.

[13]  Christophe Giraud-Carrier,et al.  Evolving fuzzy prototypes for efficient data clustering , 1997 .

[14]  Daphne Koller,et al.  Nonuniform Dynamic Discretization in Hybrid Networks , 1997, UAI.

[15]  Ke Wang,et al.  Concurrent Discretization of Multiple Attributes , 1998, PRICAI.

[16]  Marek Kretowski,et al.  An Evolutionary Algorithm Using Multivariate Discretization for Decision Rule Induction , 1999, PKDD.

[17]  Chun-Nan Hsu,et al.  Why Discretization Works for Naive Bayesian Classifiers , 2000, ICML.

[18]  Jesús S. Aguilar-Ruiz,et al.  Discretization oriented to Decision Rules Generation , 2001 .

[19]  Geoffrey I. Webb,et al.  Proportional k-Interval Discretization for Naive-Bayes Classifiers , 2001, ECML.

[20]  Juho Rousu,et al.  Fast Minimum Training Error Discretization , 2002, International Conference on Machine Learning.

[21]  Geoffrey I. Webb,et al.  Non-Disjoint Discretization for Naive-Bayes Classifiers , 2002, ICML.

[22]  Federico Divina,et al.  A Method for Handling Numerical Attributes in GA-Based Inductive Concept Learners , 2003, GECCO.

[23]  Jaume Bacardit,et al.  Evolving Multiple Discretizations with Adaptive Intervals for a Pittsburgh Rule-Based Learning Classifier System , 2003, GECCO.

[24]  Jesús S. Aguilar-Ruiz,et al.  Natural Coding: A More Efficient Representation for Evolutionary Learning , 2003, GECCO.

[25]  Marc Boullé,et al.  Khiops: A Statistical Discretization Method of Continuous Attributes , 2004, Machine Learning.

[26]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[27]  Jaume Bacardit,et al.  Analysis and Improvements of the Adaptive Discretization Intervals Knowledge Representation , 2004, GECCO.

[28]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Federico Divina,et al.  Experimental Evaluation of Discretization Schemes for Rule Induction , 2004, GECCO.

[30]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[31]  Jerzy W. Grzymala-Busse,et al.  Three Strategies to Rule Induction from Data with Numerical Attributes , 2003, Trans. Rough Sets.

[32]  Tapio Elomaa,et al.  Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates , 2004, Data Mining and Knowledge Discovery.

[33]  Marc Boullé,et al.  Multivariate Discretization by Recursive Supervised Bipartition of Graph , 2005, MLDM.

[34]  Federico Divina,et al.  Handling continuous attributes in an evolutionary inductive learner , 2005, IEEE Transactions on Evolutionary Computation.

[35]  Davy Janssens,et al.  Evaluating the performance of cost-based discretization versus entropy- and error-based discretization , 2006, Comput. Oper. Res..

[36]  Tommi S. Jaakkola,et al.  Predictive Discretization during Model Selection , 2004, AISTATS.

[37]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .