Neuron with Steady Response Leads to Better Generalization

Regularization can mitigate the generalization gap between training and inference by introducing inductive bias. Existing works have already proposed various inductive biases from diverse perspectives. However, to the best of our knowledge, none of them explores inductive bias from the perspective of class-dependent response distribution of individual neurons. In this paper, we conduct a substantial analysis of the characteristics of such distribution. Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization. Accordingly, we propose a new regularization method called Neuron Steadiness Regularization to reduce neuron intra-class response variance. We conduct extensive experiments on Multilayer Perceptron, Convolutional Neural Network, and Graph Neural Network with popular benchmark datasets of diverse domains, which show that our Neuron Steadiness Regularization consistently outperforms the vanilla version of models with significant gain and low additional overhead.

[1]  Qingming Huang,et al.  Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yun Wang,et al.  Tag2Gauss: Learning Tag Representations via Gaussian Distribution in Tagged Networks , 2019, IJCAI.

[3]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[4]  Mukund Sundararajan,et al.  How Important Is a Neuron? , 2018, ICLR.

[5]  Geoffrey E. Hinton,et al.  Dimensionality Reduction and Prior Knowledge in E-Set Recognition , 1989, NIPS.

[6]  Tingyang Xu,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2020, ICLR.

[7]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[9]  Shuwen Yang,et al.  Domain Adaptive Classification on Heterogeneous Information Networks , 2020, IJCAI.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Max Mühlhäuser,et al.  Manifestation of virtual assistants and robots into daily life: vision and challenges , 2019, CCF Transactions on Pervasive Computing and Interaction.

[12]  Judy Hoffman,et al.  Robust Learning with Jacobian Regularization , 2019, ArXiv.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[16]  Wei Lin,et al.  Tag2Vec: Learning Tag Representations in Tag Networks , 2019, WWW.

[17]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[18]  Yilun Jin,et al.  Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach , 2019, CIKM.

[19]  Bernhard Pfahringer,et al.  MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes , 2018, ECML/PKDD.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hongzhi Chen,et al.  Measuring and Improving the Use of Graph Information in Graph Neural Networks , 2020, ICLR.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Dongmei Zhang,et al.  TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data , 2021, KDD.

[24]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[25]  Jane You,et al.  Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Xiaojun Ma,et al.  Improving Graph Neural Networks with Structural Adaptive Receptive Fields , 2021, WWW.

[27]  Jie Zhou,et al.  Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View , 2020, AAAI.

[28]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[29]  Zhanqiu Zhang,et al.  Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion , 2020, NeurIPS.

[30]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[31]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[32]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[33]  J. Kleinfeld COULD IT BE A BIG WORLD AFTER ALL? THE "SIX DEGREES OF SEPARATION" MYTH , 2002 .

[34]  Vít Novácek,et al.  Regularizing Knowledge Graph Embeddings via Equivalence and Inversion Axioms , 2017, ECML/PKDD.

[35]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[36]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.

[37]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[38]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[41]  Geoffrey E. Hinton,et al.  Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[42]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[45]  Shi Han,et al.  CoCoGUM: Contextual Code Summarization with Multi-Relational GNN on UMLs , 2020 .

[46]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[47]  Nicolas Usunier,et al.  Canonical Tensor Decomposition for Knowledge Base Completion , 2018, ICML.

[48]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.