An Empirical Study on Position of the Batch Normalization Layer in Convolutional Neural Networks

In this paper, we have studied how training of the convolutional neural networks (CNNs) can be affected by changing the position of the batch normalization (BN) layer. Three different convolutional neural networks have been chosen for our experiments. These networks are AlexNet, VGG-16, and ResNet- 20. We show that the speed-up provided by the BN algorithm can be further improved by using the BN in positions other than the one suggested by its original paper. Also, we discuss how the BN layer in a certain position can aid the training of one network but not the other. Three different positions for the BN layer have been studied in this research, these positions are: BN layer between the convolution layer and the non-linear activation function, BN layer after the non-linear activation function and finally, the BN layer before each of the convolutional layers.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[4]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .