Research on a learning rate with energy index in deep learning

The stochastic gradient descent algorithm (SGD) is the main optimization solution in deep learning. The performance of SGD depends critically on how learning rates are tuned over time. In this paper, we propose a novel energy index based optimization method (EIOM) to automatically adjust the learning rate in the backpropagation. Since a frequently occurring feature is more important than a rarely occurring feature, we update the features to different extents according to their frequencies. We first define an energy neuron model and then design an energy index to describe the frequency of a feature. The learning rate is taken as a hyperparameter function according to the energy index. To empirically evaluate the EIOM, we investigate different optimizers with three popular machine learning models: logistic regression, multilayer perceptron, and convolutional neural network. The experiments demonstrate the promising performance of the proposed EIOM compared with that of other optimization algorithms.

[1]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[5]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[6]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Huan Zhao,et al.  A novel activation function for multilayer feed-forward neural networks , 2016, Applied Intelligence.

[9]  Jun Wu,et al.  Pornographic image detection utilizing deep convolutional neural networks , 2016, Neurocomputing.

[10]  Derong Liu,et al.  An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state , 2012, Neural Networks.

[11]  Alain Richard,et al.  Identification of rule-based fuzzy models using the RPROP optimization technique , 1996 .

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  B. Chandra,et al.  Deep learning with adaptive learning rate using laplacian score , 2016, Expert Syst. Appl..

[15]  Peng Wang,et al.  Self-Taught Convolutional Neural Networks for Short Text Clustering , 2017, Neural Networks.

[16]  Margaret Lech,et al.  Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.

[17]  H. Robbins A Stochastic Approximation Method , 1951 .

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[20]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[21]  Sven Behnke,et al.  Object class segmentation of RGB-D video using recurrent convolutional neural networks , 2017, Neural Networks.