Research on three-step accelerated gradient algorithm in deep learning

[1]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[2]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[3]  Guanghui Lan,et al.  Optimal Adaptive and Accelerated Stochastic Gradient Descent , 2018, ArXiv.

[4]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Stefan Fritsch,et al.  Training of Neural Networks [R package neuralnet version 1.44.2] , 2019 .

[7]  B. V. Shah,et al.  Some Algorithms for Minimizing a Function of Several Variables , 1964 .

[8]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[9]  M.N. Vrahatis,et al.  Parallel tangent methods with variable stepsize , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[12]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien [R package e1071 version 1.7-4] , 2020 .

[13]  Kevin K. Chen,et al.  The Upper Bound on Knots in Neural Networks , 2016, ArXiv.

[14]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[15]  H. Borchers Practical Numerical Math Functions [R package pracma version 2.2.9] , 2019 .

[16]  Ronald Davis,et al.  Neural networks and deep learning , 2017 .

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Guanghui Lan Convex optimization under inexact first-order information , 2009 .

[19]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[20]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Emile Fiesler,et al.  Neural Networks with Adaptive Learning Rate and Momentum Terms , 1995 .

[23]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[24]  Claus Nebauer,et al.  Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.

[25]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.