Learning a Tree of Neural Nets

Much of the success of deep learning is due to choosing good neural net architectures and being able to train them effectively. A type of architecture that has been long sought is one that combines decision trees and neural nets. This is straightforward if the tree makes soft decisions (i.e., an input instance follows all paths in the tree with different probabilities), because the model is differentiable. However, the optimization is much harder if the tree makes hard decisions, but this produces an architecture that is much faster at inference, since an instance follows a single path in the tree. We show that it is possible to train such architectures, with guaranteed monotonic decrease of the loss, and demonstrate it by learning trees with linear decision nodes and deep nets at the leaves. The resulting architecture improves state-of-the-art deep nets, by achieving comparable or lower classification error but with fewer parameters and faster inference time. In particular, we show that, rather than improving a ResNet by making it deeper, it is better to construct a tree of small ResNets. The resulting tree-net hybrid is also more interpretable.

[1]  Miguel Á. Carreira-Perpiñán,et al.  Smaller, more accurate regression forests using tree alternating optimization , 2020, ICML.

[2]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Antonio Criminisi,et al.  Adaptive Neural Trees , 2018, ICML.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Miguel Á. Carreira-Perpiñán,et al.  "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Peter Kontschieder,et al.  Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, IEEE Trans. Neural Networks.

[11]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[12]  Miguel Á. Carreira-Perpiñán,et al.  Alternating optimization of decision trees, with application to learning sparse oblique trees , 2018, NeurIPS.

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Horst Bischof,et al.  Alternating Decision Forests , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[21]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Miguel Á. Carreira-Perpiñán,et al.  Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting , 2020, FODS.

[23]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[24]  Miguel 'A. Carreira-Perpin'an,et al.  An Experimental Comparison of Old and New Decision Tree Algorithms , 2019, ArXiv.

[25]  Han Xiao,et al.  NDT: Neual Decision Tree Towards Fully Functioned Neural Graph , 2017, Unmanned Syst..