暂无分享,去创建一个
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] A. Bray,et al. Statistics of critical points of Gaussian fields on large-dimensional spaces. , 2006, Physical review letters.
[3] Ernst Hairer,et al. Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .
[4] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[6] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[9] eon BottouAT. Stochastic Gradient Learning in Neural Networks , 2022 .
[10] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[11] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[12] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.
[13] Pierre Baldi,et al. Complex-Valued Autoencoders , 2011, Neural Networks.
[14] Philipp Hennig,et al. Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.
[15] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[16] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[17] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.
[18] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[19] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .
[20] J. Butcher. Coefficients for the study of Runge-Kutta integration processes , 1963, Journal of the Australian Mathematical Society.
[21] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[22] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[23] H. Robbins. A Stochastic Approximation Method , 1951 .
[24] Andrea Montanari,et al. Convergence rates of sub-sampled Newton methods , 2015, NIPS.
[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[26] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[27] Qiang Chen,et al. Network In Network , 2013, ICLR.
[28] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[29] Pierre Baldi,et al. Linear Learning: Landscapes and Algorithms , 1988, NIPS.
[30] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[31] Razvan Pascanu,et al. Local minima in training of deep networks , 2017, ArXiv.
[32] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .