Layer-Wise Weight Decay for Deep Neural Networks

In this paper, we propose layer-wise weight decay for efficient training of deep neural networks. Our method sets different values of the weight-decay coefficients layer by layer so that the ratio of the scale of back-propagated gradients and that of the weight decay is constant throughout the network. By utilizing such a setting, we can avoid under or over-fitting and train all layers properly without having to tune the coefficients layer by layer. Experimental results show that our method can enhance the performance of existing deep neural networks without any change of network models.

[1]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[2]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Nando de Freitas,et al.  Unbounded Bayesian Optimization via Regularization , 2015, AISTATS.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Razvan Pascanu,et al.  Natural Neural Networks , 2015, NIPS.

[11]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[12]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[13]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[14]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.