Rectified Decision Trees: Exploring the Landscape of Interpretable and Effective Machine Learning

Interpretability and effectiveness are two essential and indispensable requirements for adopting machine learning methods in reality. In this paper, we propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT), to explore the possibility of fulfilling those requirements simultaneously. Specifically, we extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels while preserving the deterministic splitting paths. We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method. Accordingly, ReDT preserves the excellent interpretable nature of the decision trees while having a relatively good performance. The effectiveness of adopting soft labels instead of hard ones is also analyzed empirically and theoretically. Surprisingly, experiments indicate that the introduction of soft labels also reduces the model size compared with the standard decision trees from the aspect of the total nodes and rules, which is an unexpected gift from the `dark knowledge' distilled from the teacher model.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[3]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[4]  Quanshi Zhang,et al.  Examining CNN representations with respect to Dataset Bias , 2017, AAAI.

[5]  Ethem Alpaydin,et al.  Soft decision trees , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[6]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach , 2016, AISTATS.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[15]  N. Meinshausen Node harvest: simple and interpretable regression and classication , 2009, 0910.2145.

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jinjun Chen,et al.  Detection of Malicious Code Variants Based on Deep Learning , 2018, IEEE Transactions on Industrial Informatics.

[19]  Haibin Cheng,et al.  Real-time Personalization using Embeddings for Search Ranking at Airbnb , 2018, KDD.

[20]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  L. Breiman,et al.  BORN AGAIN TREES , 1996 .

[24]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[25]  F. Galton Regression Towards Mediocrity in Hereditary Stature. , 1886 .

[26]  Mianxiong Dong,et al.  Deep Learning for Smart Industry: Efficient Manufacture Inspection System With Fog Computing , 2018, IEEE Transactions on Industrial Informatics.

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).