Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network

Training a deep neural network with a large number of parameters often leads to overfitting problem. Recently, Dropout has been introduced as a simple, yet effective regularization approach to combat overfitting in such models. Although Dropout has shown remarkable results on many deep neural network cases, its actual effect on CNN has not been thoroughly explored. Moreover, training a Dropout model will significantly increase the training time as it takes longer time to converge than a non-Dropout model with the same architecture. To deal with these issues, we address Biased Dropout and Crossmap Dropout, two novel approaches of Dropout extension based on the behavior of hidden units in CNN model. Biased Dropout divides the hidden units in a certain layer into two groups based on their magnitude and applies different Dropout rate to each group appropriately. Hidden units with higher activation value, which give more contributions to the network final performance, will be retained by a lower Dropout rate, while units with lower activation value will be exposed to a higher Dropout rate to compensate the previous part. The second approach is Crossmap Dropout, which is an extension of the regular Dropout in convolution layer. Each feature map in a convolution layer has a strong correlation between each other, particularly in every identical pixel location in each feature map. Crossmap Dropout tries to maintain this important correlation yet at the same time break the correlation between each adjacent pixel with respect to all feature maps by applying the same Dropout mask to all feature maps, so that all pixels or units in equivalent positions in each feature map will be either dropped or active during training. Our experiment with various benchmark datasets shows that our approaches provide better generalization than the regular Dropout. Moreover, our Biased Dropout takes faster time to converge during training phase, suggesting that assigning noise appropriately in hidden units can lead to an effective regularization.

[1]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[6]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[7]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[9]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Xiaodong Gu,et al.  Towards dropout training for convolutional neural networks , 2015, Neural Networks.

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[18]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[19]  Kenji Satou,et al.  DNA Sequence Classification by Convolutional Neural Network , 2016 .

[20]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.