Convolutional Dynamic Alignment Networks for Interpretable Classifications

We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks 1 (CoDA-Nets), which are performant classifiers with a high degree of inherent interpretability. Their core building blocks are Dynamic Alignment Units (DAUs), which linearly transform their input with weight vectors that dynamically align with task-relevant patterns. As a result, CoDA-Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions. Given the alignment of the DAUs, the resulting contribution maps align with discriminative input patterns. These model-inherent decompositions are of high visual quality and outperform existing attribution methods under quantitative metrics. Further, CoDA-Nets constitute performant classifiers, achieving on par results to ResNet and VGG models on e.g. CIFAR-10 and TinyImagenet.

[1]  ResNet on Tiny ImageNet , 2017 .

[2]  Paisarn Muneesawang,et al.  An improved residual network model for image recognition using a combination of snapshot ensembles and the cutout technique , 2019, Multimedia Tools and Applications.

[3]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Bohan Jia,et al.  DE-CapsNet: A Diverse Enhanced Capsule Network with Disperse Dynamic Routing , 2020 .

[5]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[6]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[7]  Quoc V. Le,et al.  RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.

[8]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[10]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[11]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[12]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[13]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[14]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[15]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[19]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[22]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[23]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[24]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[25]  Bernhard Pfahringer,et al.  MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes , 2018, ECML/PKDD.

[26]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[27]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[28]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[29]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[30]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[31]  Francois Fleuret,et al.  Full-Gradient Representation for Neural Network Visualization , 2019, NeurIPS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[36]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[37]  Vijayan K. Asari,et al.  Improved inception-residual convolutional neural network for object recognition , 2017, Neural Computing and Applications.

[38]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[39]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).