A Novel Hyperparameter-Free Approach to Decision Tree Construction That Avoids Overfitting by Design

Decision trees are an extremely popular machine learning technique. Unfortunately, overfitting in decision trees still remains an open issue that sometimes prevents achieving good performance. In this paper, we present a novel approach for the construction of decision trees that avoids the overfitting by design, without losing accuracy. A distinctive feature of our algorithm is that it requires neither the optimization of any hyperparameters, nor the use of regularization techniques, thus significantly reducing the decision tree training time. Moreover, our algorithm produces much smaller and shallower trees than traditional algorithms, facilitating the interpretability of the resulting models. For reproducibility, we provide an open source version of the algorithm.

[1]  Alfonso Ortega,et al.  Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor , 2005, Commun. Inf. Syst..

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[3]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[4]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[7]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[8]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[9]  Carolyn Pillers Dobler,et al.  The Practice of Statistics , 2001, Technometrics.

[10]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[11]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[12]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[13]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[14]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[15]  Miguel Á. Carreira-Perpiñán,et al.  Alternating optimization of decision trees, with application to learning sparse oblique trees , 2018, NeurIPS.

[16]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Gregory J. Chaitin,et al.  On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers , 1969, J. ACM.

[19]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[20]  Mike Wu,et al.  Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[21]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[22]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[23]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Kyuseok Shim,et al.  PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning , 1998, Data Mining and Knowledge Discovery.

[26]  Ivan Bratko,et al.  Trading Accuracy for Simplicity in Decision Trees , 1994, Machine Learning.

[27]  C. S. Wallace,et al.  Coding Decision Trees , 1993, Machine Learning.

[28]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[29]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[30]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[31]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[32]  Dmitry Yu. Ignatov,et al.  Decision Stream: Cultivating Deep Decision Trees , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[33]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[34]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .