Learning Optimal Decision Trees from Large Datasets

Inferring a decision tree from a given dataset is one of the classic problems in machine learning. This problem consists of buildings, from a labelled dataset, a tree such that each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is known to be NP-complete for several definitions of optimality. Hence, the majority of existing approaches relies on heuristics, and as for the few exact inference approaches, they do not work on large data sets. In this paper, we propose a novel approach for inferring a decision tree of a minimum depth based on the incremental generation of Boolean formula. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of dataset.

[1]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[2]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  Tao Jiang,et al.  Lower Bounds on Learning Decision Lists and Trees , 1995, Inf. Comput..

[5]  Barry O'Sullivan,et al.  Minimising Decision Tree Size as Combinatorial Optimisation , 2009, CP.

[6]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[7]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[8]  Michael van Lent,et al.  An Explainable Artificial Intelligence System for Small-unit Tactical Behavior , 2004, AAAI.

[9]  Freddy Lécué,et al.  Explainable AI: The New 42? , 2018, CD-MAKE.

[10]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[11]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[12]  Joao Marques-Silva,et al.  Learning Optimal Decision Trees with SAT , 2018, IJCAI.

[13]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[14]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[15]  F. Melchels,et al.  Effects of scaffold architecture on mechanical characteristics and osteoblast response to static and perfusion bioreactor cultures , 2014, Biotechnology and bioengineering.

[16]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[17]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[18]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[19]  Dimitris Bertsimas,et al.  Classification and Regression via Integer Optimization , 2007, Oper. Res..