A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and limited-memory BFGS optimization algorithms

Abstract Working up with deep learning techniques requires profound understanding of the mechanisms underlying the optimization of the internal parameters of complex structures. The major factor limiting this understanding is that there exist only a few optimization methods such as gradient descent and Limited–memory Broyden–Fletcher–Goldfarb–Shannon (L-BFGS) to find the best local minima of the problem space for these complex structures such as deep neural network (DNN). Therefore, in this paper, we represent a new training approach named hybrid artificial bee colony based training strategy (HABCbTS) to tune the parameters of a DNN structure, which includes one or more autoencoder layers cascaded to a softmax classification layer. In this strategy, a derivative-free optimization algorithm “ABC” is combined with a derivative-based algorithm “L-BFGS” to construct “HABC”, which is used in the HABCbTS. Detailed simulation results supported by statistical analysis show that the proposed training strategy results in better classification performance compared to the DNN classifier trained with the L-BFGS, ABC and modified ABC. The obtained classification results are also compared with the state-of-the-art classifiers, including MLP, SVM, KNN, DT and NB on 15 data sets with different dimensions and sizes.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[3]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Richard N Henson,et al.  A multi-subject, multi-modal human neuroimaging dataset , 2015, Scientific Data.

[6]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[7]  Sen Jia,et al.  Convolutional neural networks for hyperspectral image classification , 2017, Neurocomputing.

[8]  Yan Zhang,et al.  Deep neural network for halftone image classification based on sparse auto-encoder , 2016, Eng. Appl. Artif. Intell..

[9]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[10]  Selcuk Aslan,et al.  Best Supported Emigrant Creation for Parallel Implementation of Artificial Bee Colony Algorithm , 2016 .

[11]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[12]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[13]  Jorge Nocedal,et al.  A Multi-Batch L-BFGS Method for Machine Learning , 2016, NIPS.

[14]  Alper Bastürk,et al.  Parallel Implementation of Synchronous Type Artificial Bee Colony Algorithm for Global Optimization , 2012, J. Optim. Theory Appl..

[15]  Wenjie Lu,et al.  Regional deep learning model for visual tracking , 2016, Neurocomputing.

[16]  D Karaboga,et al.  A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences. , 2016, Genetics and molecular research : GMR.

[17]  D. Karaboga,et al.  On the performance of artificial bee colony (ABC) algorithm , 2008, Appl. Soft Comput..

[18]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[19]  Hasan Badem,et al.  A Deep Neural Network Classifier for Decoding Human Brain Activity Based on Magnetoencephalography , 2017 .

[20]  Fuad E. Alsaadi,et al.  A Novel Switching Delayed PSO Algorithm for Estimating Unknown Parameters of Lateral Flow Immunoassay , 2016, Cognitive Computation.

[21]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[22]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[23]  Dervis Karaboga,et al.  THE ARTIFICIAL BEE COLONY ALGORITHM IN TRAINING ARTIFICIAL NEURAL NETWORK FOR OIL SPILL DETECTION , 2011 .

[24]  Kok Lay Teo,et al.  A hybrid approach to constrained global optimization , 2016, Appl. Soft Comput..

[25]  Hasan Badem,et al.  Deep neural network classifier for hand movement prediction , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[26]  Hasan Badem,et al.  Classification of human activity by using a Stacked Autoencoder , 2016, 2016 Medical Technologies National Congress (TIPTEKNO).

[27]  Zidong Wang,et al.  A Hybrid EKF and Switching PSO Algorithm for Joint State and Parameter Estimation of Lateral Flow Immunoassay Models , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Chin-Hui Lee,et al.  A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition , 2016, Neurocomputing.

[29]  Dervis Karaboga,et al.  A comparative study of Artificial Bee Colony algorithm , 2009, Appl. Math. Comput..

[30]  Karl J. Friston,et al.  A Parametric Empirical Bayesian Framework for the EEG/MEG Inverse Problem: Generative Models for Multi-Subject and Multi-Modal Integration , 2011, Front. Hum. Neurosci..

[31]  Dervis Karaboga,et al.  A modified Artificial Bee Colony algorithm for real-parameter optimization , 2012, Inf. Sci..

[32]  Selcuk Aslan,et al.  A new artificial bee colony algorithm to solve the multiple sequence alignment problem , 2016, Int. J. Data Min. Bioinform..

[33]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[34]  Hak-Keung Lam,et al.  Tuning of the structure and parameters of a neural network using an improved genetic algorithm , 2003, IEEE Trans. Neural Networks.

[35]  Alper Bastürk,et al.  Performance analysis of the coarse-grained parallel model of the artificial bee colony algorithm , 2013, Inf. Sci..

[36]  Fuad E. Alsaadi,et al.  A switching delayed PSO optimized extreme learning machine for short-term load forecasting , 2017, Neurocomputing.

[37]  Hasan Badem,et al.  Classification and diagnosis of the parkinson disease by stacked autoencoder , 2016, 2016 National Conference on Electrical, Electronics and Biomedical Engineering (ELECO).

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  Fuad E. Alsaadi,et al.  Deep Belief Networks for Quantitative Analysis of a Gold Immunochromatographic Strip , 2016, Cognitive Computation.

[40]  Derviş Karaboğa,et al.  NEURAL NETWORKS TRAINING BY ARTIFICIAL BEE COLONY ALGORITHM ON PATTERN CLASSIFICATION , 2009 .

[41]  Ponnuthurai Nagaratnam Suganthan,et al.  Problem Definitions and Evaluation Criteria for CEC 2015 Special Session on Bound Constrained Single-Objective Computationally Expensive Numerical Optimization , 2015 .

[42]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[43]  Paolo Avesani,et al.  MEG decoding across subjects , 2014, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[44]  Dervis Karaboga,et al.  A modified Artificial Bee Colony (ABC) algorithm for constrained optimization problems , 2011, Appl. Soft Comput..

[45]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[46]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.