Interpreting and Boosting Dropout from a Game-Theoretic View

This paper aims to understand and improve the utility of the dropout operation from the perspective of game-theoretic interactions. We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretic proof is also verified by various experiments. Furthermore, we find that such interactions were strongly related to the over-fitting problem in deep learning. Thus, the utility of dropout can be regarded as decreasing interactions to alleviate the significance of over-fitting. Based on this understanding, we propose an interaction loss to further improve the utility of dropout. Experimental results have shown that the interaction loss can effectively improve the utility of dropout and boost the performance of DNNs.

[1]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[2]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[3]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[4]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Michel Grabisch,et al.  An axiomatic approach to the concept of interaction among players in cooperative games , 1999, Int. J. Game Theory.

[7]  Xue Feng,et al.  Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection , 2020, ICLR.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[10]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[11]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[13]  Rich Caruana,et al.  Detecting statistical interactions with additive groves of trees , 2008, ICML '08.

[14]  Xiang Ren,et al.  Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models , 2020, ICLR.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Pascal Vincent,et al.  Dropout as data augmentation , 2015, ArXiv.

[17]  Joseph D. Janizek,et al.  Explaining Explanations: Axiomatic Feature Interactions for Deep Networks , 2020, J. Mach. Learn. Res..

[18]  Xiang Li,et al.  Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[22]  Jian Pei,et al.  Demystifying Dropout , 2019, ICML.

[23]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[24]  L. Shapley A Value for n-person Games , 1988 .

[25]  Daniel Gómez,et al.  Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..

[26]  Yan Liu,et al.  Detecting Statistical Interactions from Neural Network Weights , 2017, ICLR.

[27]  Chandan Singh,et al.  Hierarchical interpretations for neural network predictions , 2018, ICLR.

[28]  Bin Yu,et al.  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.