A novel neural-network gradient optimization algorithm based on reinforcement learning

Searching appropriate step size and hyperparameter is the key to getting a robust convergence for gradient descent optimization algorithm. This study comes up with a novel gradient descent strategy based on reinforce learning, in which the gradient information of each time step is expressed as the state information of markov decision process in iterative optimization of neural network. We design a variable-view distance planner with a markov decision process as its recursive core for neural-network gradient descent. It combines the advantages of model-free learning and model-based learning, and fully utilizes the state transition information of the optimized neural-network objective function at each step. Experimental results show that the proposed method not only retains the merits of the model-free asymptotic optimal strategy but also enhances the utilization rate of samples compared with manually designed optimization algorithms.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[3]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[4]  Samy Bengio,et al.  On the search for new learning rules for ANNs , 1995, Neural Processing Letters.

[5]  Jitendra Malik,et al.  Learning to Optimize Neural Nets , 2017, ArXiv.

[6]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[7]  G. Evans,et al.  Learning to Optimize , 2008 .

[8]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[9]  Jian Li,et al.  Learning Gradient Descent: Better Generalization and Longer Horizons , 2017, ICML.

[10]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11]  Magnus Thor Jonsson,et al.  Evolution and design of distributed learning rules , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[14]  H. Robbins A Stochastic Approximation Method , 1951 .

[15]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  E. Kehoe A layered network model of associative learning: learning to learn and configuration. , 1988, Psychological review.

[18]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[19]  Nathan Srebro,et al.  The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[20]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[21]  Peter R. Conwell,et al.  Fixed-weight networks can learn , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[22]  Daniel Jiwoong Im,et al.  An empirical analysis of the optimization of deep network loss surfaces , 2016, 1612.04010.

[23]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[24]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.