Automated Optimal Architecture of Deep Convolutional Neural Networks for Image Recognition

Recent advancements in deep Convolutional Neural Networks (CNNs) have led to impressive progress in computer vision, especially in image classification. CNNs involve numerous hyperparameters that identify the network's structure such as depth of the network, kernel size, number of feature maps, stride, pooling size and pooling regions etc. These hyperparameters have a significant impact on the classification accuracy of a CNN. Selecting proper CNN architecture is different from one dataset to another. An empirical approach is often used in determining the near optimal value of these hyperparameters. Some recent works have tried optimization techniques for hyperparameter selection as well. In this paper, we develop a framework for hyperparameter optimization that is based on a new objective function that combines the information from the visualization of learned feature maps via deconvolutional networks, and the accuracy of the trained CNN model. Nelder-Mead Algorithm (NMA) is used in guiding the CNN architecture towards near optimal hyperparameters. Our proposed approach is evaluated on CIFAR-10 and Caltech-101 benchmarks. The experimental results indicate that the final architecture of a CNN obtained by our objective function outperforms other approaches in terms of accuracy. It is shown that our optimization framework contributes to increase in the depth of network, shrinks the size of stride and pooling sizes to obtain the best CNN architecture.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[3]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[7]  Ausif Mahmood,et al.  Analysis of instance selection algorithms on large datasets with Deep Convolutional Neural Networks , 2016, 2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT).

[8]  Steven R. Young,et al.  Optimizing deep learning hyper-parameters through an evolutionary algorithm , 2015, MLHPC@SC.

[9]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[18]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[19]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[21]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[23]  Edwin Lughofer,et al.  Autonomous data stream clustering implementing split-and-merge concepts - Towards a plug-and-play approach , 2015, Inf. Sci..

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Mikhail Bilenko,et al.  Lazy Paired Hyper-Parameter Tuning , 2013, IJCAI.

[29]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.