论文信息 - Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks

Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks

Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scalingbased weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method uses a loss function which pushes each weight vector to have a norm close to one, i.e. the weight matrix is smoothly steered toward the so-called Oblique manifold. We evaluate our method on the very popular CIFAR10, CIFAR-100 and ImageNet 2012 datasets using two stateof-the-art architectures, namely the ResNet and wide-ResNet. Our method introduces negligible computational overhead and the results show that it is competitive to the state-of-the-art and in some cases superior to it. Additionally, the results are less sensitive to hyperparameter settings such as batch size and regularization factor.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[3] Lei Huang,et al. Projection Based Weight Normalization for Deep Neural Networks , 2017, ArXiv.

[4] Shiliang Pu,et al. All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Taesup Kim,et al. Fast AutoAugment , 2019, NeurIPS.

[6] Ling Shao,et al. Controllable Orthogonalization in Training DNNs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.

[8] Zhangyang Wang,et al. Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.

[9] Richard G. Baraniuk,et al. A Spline Theory of Deep Learning , 2018, ICML 2018.

[10] Takayuki Okatani,et al. Optimization on Submanifolds of Convolution Kernels in CNNs , 2016, ArXiv.

[11] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Yuichi Yoshida,et al. Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[13] Brahim Chaib-draa,et al. Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[16] Basura Fernando,et al. Generalized BackPropagation, Étude De Cas: Orthogonality , 2016, ArXiv.

[17] Kaiming He,et al. Group Normalization , 2018, ECCV.

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] Wei Chen,et al. A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision , 2019, International Journal of Multimedia Information Retrieval.

[20] Christopher Joseph Pal,et al. On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[21] Liang Lin,et al. Kalman Normalization: Normalizing Internal Representations Across Network Layers , 2018, NeurIPS.

[22] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[23] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Pierre-Antoine Absil,et al. Joint Diagonalization on the Oblique Manifold for Independent Component Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[28] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[29] Yu Liu,et al. A review of semantic segmentation using deep neural networks , 2017, International Journal of Multimedia Information Retrieval.

[30] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.

[33] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[34] Xianglong Liu,et al. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.