Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

We consider multisplitting of numerical value ranges, a task that is encountered as a discretization step preceding induction and also embedded into learning algorithms. We are interested in finding the partition that optimizes the value of a given attribute evaluation function. For most commonly used evaluation functions this task takes quadratic time in the number of potential cut points in the numerical range. Hence, it is a potential bottleneck in data mining algorithms.We present two techniques that speed up the optimal multisplitting task. The first one aims at discarding cut point candidates in a quick linear-time preprocessing scan before embarking on the actual search. We generalize the definition of boundary points by Fayyad and Irani to allow us to merge adjacent example blocks that have the same relative class distribution. We prove for several commonly used evaluation functions that this processing removes only suboptimal cut points. Hence, the algorithm does not lose optimality.Our second technique tackles the quadratic-time dynamic programming algorithm, which is the best schema for optimizing many well-known evaluation functions. We present a technique that dynamically—i.e., during the search—prunes partitions of prefixes of the sorted data from the search space of the algorithm. The method works for all convex and cumulative evaluation functions.Together the use of these two techniques speeds up the multisplitting process considerably. Compared to the baseline dynamic programming algorithm the speed-up is around 50 percent on the average and up to 90 percent in some cases. We conclude that optimal multisplitting is fully feasible on all benchmark data sets we have encountered.

[1]  C. Brodley,et al.  On the Qualitative Behavior of Impurity-Based Splitting Rules I: The Minima-Free Property , 1997 .

[2]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[3]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[4]  J. R. Quilan Decision trees and multi-valued attributes , 1988 .

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Tapio Elomaa,et al.  General and Efficient Multisplitting of Numerical Attributes , 1999, Machine Learning.

[7]  Xindong Wu,et al.  A Bayesian Discretizer for Real-Valued Attributes , 1996, Comput. J..

[8]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[9]  Se June Hong,et al.  Use of Contextaul Information for Feature Ranking and Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[10]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Simon Kasif,et al.  Efficient Algorithms for Finding Multi-way Splits for Decision Trees , 1995, ICML.

[13]  Jonathan R. M. Hosking,et al.  Partitioning Nominal Attributes in Decision Trees , 1999, Data Mining and Knowledge Discovery.

[14]  Djamel A. Zighed,et al.  Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning , 1997, KDD.

[15]  Andreas Birkendorf On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory , 1997, EuroCOLT.

[16]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[17]  L. Breiman Technical Note: Some Properties of Splitting Criteria , 1996, Machine Learning.

[18]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[19]  Bernhard Pfahringer,et al.  Compression-Based Discretization of Continuous Attributes , 1995, ICML.

[20]  Ray J. Hickey,et al.  Noise Modelling and Evaluating Learning from Examples , 1996, Artif. Intell..

[21]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[22]  Wolfgang Maass,et al.  Efficient agnostic PAC-learning with simple hypothesis , 1994, COLT '94.

[23]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[24]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[25]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[26]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[27]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[28]  Juho Rousu,et al.  Efficient Range Partitioning in Classification Learning , 2001 .

[29]  Tapio Elomaa,et al.  On the Computational Complexity of Optimal Multisplitting , 2001, Fundam. Informaticae.

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[32]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[33]  Burnett Meyer,et al.  Some inequalities for elementary mean values , 1984 .

[34]  Paul D. Scott,et al.  Zeta: A Global Method for Discretization of Continuous Variables , 1997, KDD.

[35]  Ramón López de Mántaras,et al.  Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method , 1997, KDD.

[36]  Tapio Elomaa,et al.  Necessary and Sufficient Pre-processing in Numerical Range Discretization , 2003, Knowledge and Information Systems.