Finding Better Topologies for Deep Convolutional Neural Networks by Evolution

Due to the nonlinearity of artificial neural networks, designing topologies for deep convolutional neural networks (CNN) is a challenging task and often only heuristic approach, such as trial and error, can be applied. An evolutionary algorithm can solve optimization problems where the fitness landscape is unknown. However, evolutionary algorithms are computing resource intensive, which makes it difficult for problems when deep CNNs are involved. In this paper, we propose an evolutionary strategy to find better topologies for deep CNNs. Incorporating the concept of knowledge inheritance and knowledge learning, our evolutionary algorithm can be executed with limited computing resources. We applied the proposed algorithm in finding effective topologies of deep CNNs for the image classification task using CIFAR-10 dataset. After the evolution, we analyzed the topologies that performed well for this task. Our studies verify the techniques that have been commonly used in human designed deep CNNs. We also discovered that some of the graph properties greatly affect the system performance. We applied the guidelines learned from the evolution and designed new network topologies that outperform Residual Net with less layers on CIFAR-10, CIFAR-100, and SVHN dataset.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  J. Urgen Branke Evolutionary Algorithms for Neural Network Design and Training , 1995 .

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[7]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[8]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[9]  Andrew M. Sutton,et al.  Genetic Algorithms - A Survey of Models and Methods , 2012, Handbook of Natural Computing.

[10]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[11]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[12]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[13]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[14]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[15]  Jonas Mockus,et al.  Bayesian heuristic approach to global optimization and examples , 2002, J. Glob. Optim..

[16]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[17]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[18]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[19]  Kenneth O. Stanley and Bobby D. Bryant and Risto Miikkulainen,et al.  Evolving Neural Network Agents in the NERO Video Game , 2005 .

[20]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[23]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[24]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[25]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[26]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[27]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Alexander Wong,et al.  Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks , 2017, Neural Processing Letters.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[39]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.