Nesterov's accelerated gradient and momentum as approximations to regularised update descent
暂无分享,去创建一个
[1] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[2] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[3] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[4] P. Hammond,et al. Essential Mathematics for Economic Analysis , 2002 .
[5] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[6] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[7] Yann Ollivier,et al. Speed learning on the fly , 2015, ArXiv.