Efficient Non-greedy Optimization of Decision Trees

Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy procedure often leads to suboptimal trees. In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective. We show that the problem of finding optimal linear-combination (oblique) splits for decision trees is related to structured prediction with latent variables, and we formulate a convex-concave upper bound on the tree's empirical loss. Computing the gradient of the proposed surrogate objective with respect to each training exemplar is O(d2), where d is the tree depth, and thus training deep trees is feasible. The use of stochastic gradient descent for optimization enables effective training with large datasets. Experiments on several classification benchmarks demonstrate that the resulting non-greedy decision trees outperform greedy decision tree baselines.

[1]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[2]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[3]  Steven L. Salzberg,et al.  On growing better decision trees from data , 1996 .

[4]  K. Bennett,et al.  A support vector machine approach to decision trees , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[7]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[8]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[13]  Kristin P. Bennett,et al.  Global Tree Optimization: A Non-greedy Decision Tree Algorithm , 2007 .

[14]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[18]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Ben Glocker,et al.  Neighbourhood Approximation Forests , 2012, MICCAI.

[21]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[22]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[23]  Sebastian Nowozin,et al.  Loss-Specific Training of Non-Parametric Image Restoration Models: A New State of the Art , 2012, ECCV.

[24]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[26]  Yee Whye Teh,et al.  Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[27]  David J. Fleet,et al.  CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits , 2015, ArXiv.