Dropout with Tabu Strategy for Regularizing Deep Neural Networks

Dropout has proven to be an effective technique for regularization and preventing the co-adaptation of neurons in deep neural networks (DNN). It randomly drops units with a probability $p$ during the training stage of DNN. Dropout also provides a way of approximately combining exponentially many different neural network architectures efficiently. In this work, we add a diversification strategy into dropout, which aims at generating more different neural network architectures in a proper times of iterations. The dropped units in last forward propagation will be marked. Then the selected units for dropping in the current FP will be kept if they have been marked in the last forward propagation. We only mark the units from the last forward propagation. We call this new technique Tabu Dropout. Tabu Dropout has no extra parameters compared with the standard Dropout and also it is computationally cheap. The experiments conducted on MNIST, Fashion-MNIST datasets show that Tabu Dropout improves the performance of the standard dropout.

[1]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[2]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[3]  Ryan P. Adams,et al.  Learning Ordered Representations with Nested Dropout , 2014, ICML.

[4]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[5]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[6]  René Vidal,et al.  Curriculum Dropout , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[8]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[9]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[10]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[14]  Jun Li,et al.  Shakeout: A New Approach to Regularized Deep Neural Network Training , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[16]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[17]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[18]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[19]  Dae-Ki Kang,et al.  Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network , 2018, Neural Networks.

[20]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[21]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.