Model selection in omnivariate decision trees using Structural Risk Minimization

As opposed to trees that use a single type of decision node, an omnivariate decision tree contains nodes of different types. We propose to use Structural Risk Minimization (SRM) to choose between node types in omnivariate decision tree construction to match the complexity of a node to the complexity of the data reaching that node. In order to apply SRM for model selection, one needs the VC-dimension of the candidate models. In this paper, we first derive the VC-dimension of the univariate model, and estimate the VC-dimension of all three models (univariate, linear multivariate or quadratic multivariate) experimentally. Second, we compare SRM with other model selection techniques including Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC) and cross-validation (CV) on standard datasets from the UCI and Delve repositories. We see that SRM induces omnivariate trees that have a small percentage of multivariate nodes close to the root and they generalize more or at least as accurately as those constructed using other model selection techniques.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[3]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[4]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[9]  Luc De Raedt,et al.  An experimental evaluation of simplicity in rule learning , 2008, Artif. Intell..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  B. Chandra,et al.  Moving towards efficient decision tree construction , 2009, Inf. Sci..

[12]  Xi-Zhao Wang,et al.  Improving Generalization of Fuzzy IF--THEN Rules by Maximizing Fuzzy Entropy , 2009, IEEE Transactions on Fuzzy Systems.

[13]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[14]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Cezary Z. Janikow,et al.  Fuzzy decision trees: issues and methods , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[17]  Yen-Liang Chen,et al.  Building a cost-constrained decision tree with multiple condition attributes , 2009, Inf. Sci..

[18]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[19]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[20]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[21]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  William Li,et al.  Columnwise-pairwise algorithms with applications to the construction of supersaturated designs , 1997 .

[23]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[24]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[25]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[26]  Xizhao Wang,et al.  Induction of multiple fuzzy decision trees based on rough set technique , 2008, Inf. Sci..

[27]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[30]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[31]  Ethem Alpaydin,et al.  Linear Discriminant Trees , 2000, ICML.

[32]  Jo Ao Gama Discriminant Trees , 1999 .

[33]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[34]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[35]  Ravi Kothari,et al.  Classifiability-based omnivariate decision trees , 2005, IEEE Transactions on Neural Networks.

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  Mehmed Kantardzic,et al.  Learning from Data , 2011 .

[38]  Ester Yen,et al.  Relaxing instance boundaries for the search of splitting points of numerical attributes in classification trees , 2007, Inf. Sci..

[39]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[40]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[41]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[42]  Ethem Alpaydin,et al.  Model Selection in Omnivariate Decision Trees , 2005, ECML.

[43]  Ethem Alpaydin,et al.  Omnivariate decision trees , 2001, IEEE Trans. Neural Networks.

[44]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[45]  Hakan Altinçay,et al.  Decision trees using model ensemble-based nodes , 2007, Pattern Recognit..

[46]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.