Handling dropout probability estimation in convolution neural networks using meta-heuristics

Deep learning-based approaches have been paramount in recent years, mainly due to their outstanding results in several application domains, ranging from face and object recognition to handwritten digit identification. Convolutional neural networks (CNNs) have attracted a considerable attention since they model the intrinsic and complex brain working mechanisms. However, one main shortcoming of such models concerns their overfitting problem, which prevents the network from predicting unseen data effectively. In this paper, we address this problem by means of properly selecting a regularization parameter known as dropout in the context of CNNs using meta-heuristic-driven techniques. As far as we know, this is the first attempt to tackle this issue using this methodology. Additionally, we also take into account a default dropout parameter and a dropout-less CNN for comparison purposes. The results revealed that optimizing dropout-based CNNs is worthwhile, mainly due to the easiness in finding suitable dropout probability values, without needing to set new parameters empirically.

[1]  João Paulo Papa,et al.  Fine-Tuning Convolutional Neural Networks Using Harmony Search , 2015, CIARP.

[2]  Xin-She Yang,et al.  LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques , 2017, ArXiv.

[3]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[4]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  João Paulo Papa,et al.  On the Model Selection of Bernoulli Restricted Boltzmann Machines Through Harmony Search , 2015, GECCO.

[11]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[12]  Xin-She Yang,et al.  Engineering optimisation by cuckoo search , 2010 .

[13]  João Paulo Papa,et al.  Model selection for Discriminative Restricted Boltzmann Machines through meta-heuristic techniques , 2015, J. Comput. Sci..

[14]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[15]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[16]  João Paulo Papa,et al.  Fine-tuning Deep Belief Networks using Harmony Search , 2016, Appl. Soft Comput..

[17]  Xin-She Yang,et al.  Learning Parameters in Deep Belief Networks Through Firefly Algorithm , 2016, ANNPR.

[18]  Alexandros Iosifidis,et al.  DropELM: Fast neural network regularization with Dropout and DropConnect , 2015, Neurocomputing.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[22]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[24]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[25]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[26]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[27]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Xiaodong Gu,et al.  Towards dropout training for convolutional neural networks , 2015, Neural Networks.

[29]  Brendan J. Frey,et al.  Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context , 2011, Bioinform..

[30]  Xin-She Yang,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[31]  D. G. Albrecht,et al.  Cortical neurons: Isolation of contrast gain control , 1992, Vision Research.

[32]  Xin-She Yang,et al.  Firefly algorithm, stochastic test functions and design optimisation , 2010, Int. J. Bio Inspired Comput..

[33]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.