Generalizing and Improving Weight Initialization

We propose a new weight initialization suited for arbitrary nonlinearities by generalizing previous weight initializations. The initialization corrects for the influence of dropout rates and an arbitrary nonlinearity’s influence on variance through simple corrective scalars. Consequently, this initialization does not require computing mini-batch statistics nor weight pre-initialization. This simple method enables improved accuracy over previous initializations, and it allows for training highly regularized neural networks where previous initializations lead to poor convergence.