Decision Jungles: Compact and Rich Models for Classification

Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.

[1]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[2]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[3]  J. J. Mahoney,et al.  Initializing ID5R with a Domain Theory: Some Negative Results , 1991 .

[4]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  D. Geman,et al.  Randomized Inquiries About Shape: An Application to Handwritten Digit Recognition. , 1994 .

[9]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[10]  Steven L. Salzberg,et al.  On growing better decision trees from data , 1996 .

[11]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[12]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[13]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[16]  Tapio Elomaa,et al.  On the Practice of Branching Program Boosting , 2001, ECML.

[17]  Boonserm Kijsirikul,et al.  Adaptive Directed Acyclic Graphs for Multiclass Classification , 2002, PRICAI.

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[22]  Alberto L. Sangiovanni-Vincentelli,et al.  Using the minimum description length principle to infer reduced ordered decision graphs , 1996, Machine Learning.

[23]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[24]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Tony R. Martinez,et al.  Reducing Decision Tree Ensemble Size Using Parallel Decision DAGs , 2009, Int. J. Artif. Intell. Tools.

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  Balázs Kégl,et al.  Fast classification using sparse decision DAGs , 2012, ICML.

[30]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[32]  Peter Kontschieder,et al.  GeoF: Geodesic Forests for Learning Coupled Predictors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.