论文信息 - Generalizing and Improving Weight Initialization

Generalizing and Improving Weight Initialization

We propose a new weight initialization suited for arbitrary nonlinearities by generalizing previous weight initializations. The initialization corrects for the influence of dropout rates and an arbitrary nonlinearity’s influence on variance through simple corrective scalars. Consequently, this initialization does not require computing mini-batch statistics nor weight pre-initialization. This simple method enables improved accuracy over previous initializations, and it allows for training highly regularized neural networks where previous initializations lead to poor convergence.

Kevin Gimpel | Dan Hendrycks | Kevin Gimpel | Dan Hendrycks

[1] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[6] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[8] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[9] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[10] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12] Jiri Matas,et al. All you need is a good init , 2015, ICLR.

[13] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.