DMA Regularization: Enhancing Discriminability of Neural Networks by Decreasing the Minimal Angle

Most of the discriminative feature learning methods are specifically developed for metric learning, however, the effectiveness may be not obvious for other tasks. In this letter, we propose a novel discrimination regularization method for image classification, which enhances the intra-class compactness and inter-class discrepancy simultaneously, through decreasing the minimal angle (DMA) between the feature vector and any one of the weight vectors in classification layer. This method can robustly improve the discriminability and generalizability of neural networks and easily exert its effect by plugging the DMA regularization term into the loss function with negligible computational overhead. The DMA regularization is simple, efficient, and effective. Therefore, it can be used as a basic regularization method for models based on neural networks. We evaluate DMA by applying it to various modern models on CIFAR10, CIFAR100, and TinyImageNet datasets, decreasing the test error rate by 0.2–0.4%, 0.2–1.5%, and 0.3-0.4% respectively. Code is available at: https://github.com/wznpub/DMA_Regularization.

[1]  F. Xavier Roca,et al.  Regularizing CNNs with Locally Constrained Decorrelations , 2016, ICLR.

[2]  Takuya Akiba,et al.  Shakedrop Regularization for Deep Residual Learning , 2018, IEEE Access.

[3]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[4]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[7]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[8]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[9]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[10]  Lei Zhang,et al.  One-shot Face Recognition by Promoting Underrepresented Classes , 2017, ArXiv.

[11]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[12]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Qiuyu Zhu,et al.  Improving Classification Performance of Softmax Loss Function Based on Scalable Batch-Normalization , 2020, Applied Sciences.

[16]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[17]  Le Song,et al.  Decoupled Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[19]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[21]  Le Song,et al.  Learning towards Minimum Hyperspherical Energy , 2018, NeurIPS.

[22]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[23]  Yaoliang Yu,et al.  Learning Latent Space Models with Angular Constraints , 2017, ICML.

[24]  Shiliang Pu,et al.  All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[26]  Le Song,et al.  Regularizing Neural Networks via Minimizing Hyperspherical Energy , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wenbin Zou,et al.  MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles , 2020, NeurIPS.

[28]  Stefanos Zafeiriou,et al.  Marginal Loss for Deep Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[30]  Wenbin Zou,et al.  PR Product: A Substitute for Inner Product in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[34]  J. T. Wu,et al.  Angular Learning: Toward Discriminative Embedded Features , 2019, ArXiv.

[35]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[36]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Le Song,et al.  Deep Hyperspherical Learning , 2017, NIPS.

[38]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[43]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[44]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).