NDT: Neual Decision Tree Towards Fully Functioned Neural Graph

Though traditional algorithms could be embedded into neural architectures with the proposed principle of \cite{xiao2017hungarian}, the variables that only occur in the condition of branch could not be updated as a special case. To tackle this issue, we multiply the conditioned branches with Dirac symbol (i.e. $\mathbf{1}_{x>0}$), then approximate Dirac symbol with the continuous functions (e.g. $1 - e^{-\alpha|x|}$). In this way, the gradients of condition-specific variables could be worked out in the back-propagation process, approximately, making a fully functioned neural graph. Within our novel principle, we propose the neural decision tree \textbf{(NDT)}, which takes simplified neural networks as decision function in each branch and employs complex neural networks to generate the output in each leaf. Extensive experiments verify our theoretical analysis and demonstrate the effectiveness of our model.

[1]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[2]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[3]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[8]  Han Xiao Hungarian Layer: Logics Empowered Neural Architecture , 2017, ArXiv.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Johannes Welbl,et al.  Casting Random Forests as Artificial Neural Networks (and Profiting from It) , 2014, GCPR.

[11]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[12]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[15]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Jimmy J. Lin,et al.  Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks , 2016, CIKM.

[21]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.