论文信息 - Research on three-step accelerated gradient algorithm in deep learning - 字舞流文

Research on three-step accelerated gradient algorithm in deep learning

Yincai Tang | Yongqiang Lian | Shirong Zhou | Shirong Zhou | Yincai Tang | Yongqiang Lian

[1] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[2] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[3] Guanghui Lan,et al. Optimal Adaptive and Accelerated Stochastic Gradient Descent , 2018, ArXiv.

[4] Kurt Hornik,et al. Support Vector Machines in R , 2006 .

[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6] Stefan Fritsch,et al. Training of Neural Networks [R package neuralnet version 1.44.2] , 2019 .

[7] B. V. Shah,et al. Some Algorithms for Minimizing a Function of Several Variables , 1964 .

[8] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.

[9] M.N. Vrahatis,et al. Parallel tangent methods with variable stepsize , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[10] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[12] Kurt Hornik,et al. Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien [R package e1071 version 1.7-4] , 2020 .

[13] Kevin K. Chen,et al. The Upper Bound on Knots in Neural Networks , 2016, ArXiv.

[14] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[15] H. Borchers. Practical Numerical Math Functions [R package pracma version 2.2.9] , 2019 .

[16] Ronald Davis,et al. Neural networks and deep learning , 2017 .

[17] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[18] Guanghui Lan. Convex optimization under inexact first-order information , 2009 .

[19] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[20] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[21] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22] Emile Fiesler,et al. Neural Networks with Adaptive Learning Rate and Momentum Terms , 1995 .

[23] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[24] Claus Nebauer,et al. Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.

[25] George D. Magoulas,et al. Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.