论文信息 - An Empirical Analysis of Deep Network Loss Surfaces - 字舞流文

An Empirical Analysis of Deep Network Loss Surfaces

The training of deep neural networks is a high-dimension optimization problem with respect to the loss function of a model. Unfortunately, these functions are of high dimension and non-convex and hence difficult to characterize. In this paper, we empirically investigate the geometry of the loss functions for state-of-the-art networks with multiple stochastic optimization methods. We do this through several experiments that are visualized on polygons to understand how and when these stochastic optimization methods find local minima.

Daniel Jiwoong Im | Kristin Branson | Michael Tao | Michael Tao | K. Branson

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] A. Bray,et al. Statistics of critical points of Gaussian fields on large-dimensional spaces. , 2006, Physical review letters.

[3] Ernst Hairer,et al. Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[4] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[9] eon BottouAT. Stochastic Gradient Learning in Neural Networks , 2022 .

[10] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[11] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[12] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.

[13] Pierre Baldi,et al. Complex-Valued Autoencoders , 2011, Neural Networks.

[14] Philipp Hennig,et al. Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.

[15] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[16] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[17] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[18] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.

[19] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[20] J. Butcher. Coefficients for the study of Runge-Kutta integration processes , 1963, Journal of the Australian Mathematical Society.

[21] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[22] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[23] H. Robbins. A Stochastic Approximation Method , 1951 .

[24] Andrea Montanari,et al. Convergence rates of sub-sampled Newton methods , 2015, NIPS.

[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[26] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.

[27] Qiang Chen,et al. Network In Network , 2013, ICLR.

[28] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29] Pierre Baldi,et al. Linear Learning: Landscapes and Algorithms , 1988, NIPS.

[30] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[31] Razvan Pascanu,et al. Local minima in training of deep networks , 2017, ArXiv.

[32] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .