Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

We characterize a prevalent weakness of deep neural networks (DNNs)---overthinking---which occurs when a DNN can reach correct predictions before its final layer. Overthinking is computationally wasteful, and it can also be destructive when, by the final layer, a correct prediction changes into a misclassification. Understanding overthinking requires studying how each prediction evolves during a DNN's forward pass, which conventionally is opaque. For prediction transparency, we propose the Shallow-Deep Network (SDN), a generic modification to off-the-shelf DNNs that introduces internal classifiers. We apply SDN to four modern architectures, trained on three image classification tasks, to characterize the overthinking problem. We show that SDNs can mitigate the wasteful effect of overthinking with confidence-based early exits, which reduce the average inference cost by more than 50% and preserve the accuracy. We also find that the destructive effect occurs for 50% of misclassifications on natural inputs and that it can be induced, adversarially, with a recent backdooring attack. To mitigate this effect, we propose a new confusion metric to quantify the internal disagreements that will likely lead to misclassifications.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[5]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[6]  John Langford,et al.  Learning Deep ResNet Blocks Sequentially using Boosting Theory , 2017, ICML.

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[9]  Bo Li,et al.  One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy , 2018, ArXiv.

[10]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[11]  Patrick D. McDaniel,et al.  Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning , 2018, ArXiv.

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Convolutional Networks for Efficient Prediction , 2017, ArXiv.

[14]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Efficient Inference , 2017, ICML.

[15]  Xin Wang,et al.  IDK Cascades: Fast Deep Learning by Learning not to Overthink , 2017, UAI.

[16]  Shelley E. Taylor,et al.  Social Cognition, from Brains to Culture , 1984 .

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[22]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[23]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[27]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[28]  Michael Eickenberg,et al.  Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[29]  D. Kahneman Thinking, Fast and Slow , 2011 .

[30]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[31]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).