NBDT: Neural-Backed Decision Trees

Machine learning applications such as finance and medicine demand accurate and justifiable predictions, barring most deep learning methods from use. In response, previous work combines decision trees with deep learning, yielding models that (1) sacrifice interpretability to maintain accuracy or (2) underperform modern neural networks to maintain interpretability. We forgo this dilemma by proposing Neural-Backed Decision Trees (NBDTs), modified hierarchical classifiers that use trees constructed in weight-space. Our NBDTs achieve (1) interpretability and (2) neural network accuracy: We preserve interpretable properties -- e.g., leaf purity and a non-ensembled model -- and demonstrate interpretability of model predictions both qualitatively and quantitatively. Furthermore, NBDTs match state-of-the-art neural networks on CIFAR10, CIFAR100, TinyImageNet, and ImageNet to within 1-2%. This yields state-of-the-art interpretable models on ImageNet, with NBDTs besting all decision-tree-based methods by ~14% to attain 75.30% top-1 accuracy. Code and pretrained NBDTs are at this https URL.

[1]  C. Rother,et al.  Mapping Stacked Decision Forests to Deep and Sparse Convolutional Neural Networks for Semantic Segmentation , 2015 .

[2]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[3]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[4]  Parham Aarabi,et al.  Adversarial Attacks on Face Detectors Using Neural Net Based Constrained Optimization , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[5]  R. C. Messenger,et al.  A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis , 1972 .

[6]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kwang-Ting Cheng,et al.  Visualizing the decision-making process in deep neural decision forest , 2019, CVPR Workshops.

[8]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[9]  Venkatesh Umaashankar,et al.  Effectiveness of Hierarchical Softmax in Large Scale Classification Tasks , 2018, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[10]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Peter Kontschieder,et al.  Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[18]  Lale Akarun,et al.  Conditional Information Gain Networks , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[19]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[20]  Quanshi Zhang,et al.  Interpreting CNNs via Decision Trees , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Chapman Siu,et al.  TreeGrad: Transferring Tree Ensembles to Neural Networks , 2019, ICONIP.

[22]  Pietro Perona,et al.  Deciding How to Decide: Dynamic Routing in Artificial Neural Networks , 2017, ICML.

[23]  Wolfgang Förstner,et al.  eTRIMS Image Database for Interpreting Images of Man-Made Scenes , 2009 .

[24]  Klaus Mueller,et al.  Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation , 2017, Image Vis. Comput..

[25]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[26]  Ke Gong,et al.  Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[28]  Dorin Comaniciu,et al.  Deep Decision Network for Multi-class Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Fred A. Hamprecht,et al.  End-to-End Learning of Decision Trees and Forests , 2019, International Journal of Computer Vision.

[31]  Tae-Kyun Kim,et al.  Deep Convolutional Decision Jungle for Image Classification , 2017, ArXiv.

[32]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[33]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[34]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[35]  Jonathan Krause,et al.  Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[37]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[38]  Olcay Boz,et al.  Converting A Trained Neural Network To a Decision Tree DecText - Decision Tree Extractor , 2002, ICMLA.

[39]  Noam Shazeer,et al.  HydraNets: Specialized Dynamic Architectures for Efficient Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[41]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[42]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[43]  J. L. Peterson,et al.  Deep Neural Network Initialization With Decision Trees , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[47]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Zeynep Akata,et al.  XOC: Explainable Observer-Classifier for Explainable Binary Decisions , 2019, ArXiv.

[49]  David McLean,et al.  Decision Tree Extraction from Trained Neural Networks , 2004, FLAIRS Conference.

[50]  Miroslav Kubat,et al.  Decision-Tree Based Neural Network (Extended Abstract) , 1995, European Conference on Machine Learning.

[51]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[52]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[53]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[55]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[57]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[59]  Lorenzo Torresani,et al.  Network of Experts for Large-Scale Image Categorization , 2016, ECCV.

[60]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[61]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Olcay Boz,et al.  Extracting decision trees from trained neural networks , 2002, KDD.

[63]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[64]  Miroslav Kubat,et al.  Initialization of neural networks by means of decision trees , 1995, Knowl. Based Syst..

[65]  Tom Drummond,et al.  Fast Residual Forests: Rapid Ensemble Learning for Semantic Segmentation , 2017, CoRL.

[66]  Thomas L. Griffiths,et al.  Learning Hierarchical Visual Representations in Deep Neural Networks Using Hierarchical Linguistic Labels , 2018, CogSci.

[67]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[68]  Arunava Banerjee Initializing neural networks using decision trees , 1997, COLT 1997.

[69]  W. Loh,et al.  Improving the precision of classification trees , 2010, 1011.0608.

[70]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[71]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[72]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[73]  Antonio Criminisi,et al.  Adaptive Neural Trees , 2018, ICML.

[74]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[75]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[76]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[77]  Zhen Li,et al.  Blockout: Dynamic Model Selection for Hierarchical Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).