SPINE: Soft Piecewise Interpretable Neural Equations

Relu Fully Connected Networks are ubiquitous but uninterpretable because they fit piecewise linear functions emerging from multi-layered structures and complex interactions of model weights. This paper takes a novel approach to piecewise fits by using set operations on individual “pieces”(parts). This is done by approximating canonical normal forms and using the resultant as a model. This gives special advantages like (a)strong correspondence of parameters to pieces of the fit function(High Interpretability); (b)ability to fit any combination of continuous functions as pieces of the piecewise function(Ease of Design); (c)ability to add new non-linearities in a targeted region of the domain(Targeted Learning); (d)simplicity of an equation which avoids layering. It can also be expressed in the general max-min representation of piecewise linear functions which gives theoretical ease and credibility. This architecture is tested on simulated regression and classification tasks and benchmark datasets including UCI datasets, MNIST, FMNIST and CIFAR 10. This performance is on par with fully connected architectures. It can find a variety of applications where fully connected layers must be replaced by interpretable layers.

[1]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[2]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[3]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[4]  Edward Y. Chang,et al.  Context-Aware Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement Learning , 2018, AAAI.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  S. Ovchinnikov Max-Min Representation of Piecewise Linear Functions , 2000, math/0009026.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Giuseppe Carlo Calafiore,et al.  A Universal Approximation Result for Difference of Log-Sum-Exp Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Giuseppe Carlo Calafiore,et al.  Log-Sum-Exp Neural Networks and Posynomial Models for Convex and Log-Log-Convex Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Jasdeep Singh Grover Differentiable Set Operations for Algebraic Expressions , 2019, ArXiv.

[12]  Mark Sellke,et al.  Approximating Continuous Functions by ReLU Nets of Minimal Width , 2017, ArXiv.

[13]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.