Controlled dropout: A different approach to using dropout on deep neural network

Deep neural networks (DNNs), which show outstanding performance in various areas, consume considerable amounts of memory and time during training. Our research led us to propose a controlled dropout technique with the potential of reducing the memory space and training time of DNNs. Dropout is a popular algorithm that solves the overfitting problem of DNNs by randomly dropping units in the training process. The proposed controlled dropout intentionally chooses which units to drop compared to conventional dropout, thereby possibly facilitating a reduction in training time and memory usage. In this paper, we focus on validating whether controlled dropout can replace the traditional dropout technique to enable us to further our research aimed at improving the training speed and memory efficiency. A performance comparison between controlled dropout and traditional dropout is carried out by implementing an image classification experiment on data comprising handwritten digits from the MNIST dataset (Mixed National Institute of Standards and Technology dataset). The experimental results show that the proposed controlled dropout is as effective as traditional dropout. Furthermore, the experimental result implies that controlled dropout is more efficient when an appropriate dropout rate and number of hidden layers are used.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  A. W.,et al.  Journal of chemical information and computer sciences. , 1995, Environmental science & technology.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[8]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[11]  Marc'Aurelio Ranzato,et al.  Multi-GPU Training of ConvNets , 2013, ICLR.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Hamza Erol,et al.  Mean Squared Error Matrix Comparisons of Some Biased Estimators in Linear Regression , 2003 .

[14]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ho-Jin Choi,et al.  Medical examination data prediction using simple recurrent network and long short-term memory , 2016, EDB.

[16]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[19]  Shie Mannor,et al.  The cross entropy method for classification , 2005, ICML.