Deep learning has enjoyed a resurgence of interest in the last few years for such applications as image and speech recognition, or natural language processing. The vast majority of practical applications of deep learning focus on supervised learning, where the supervised loss function is minimized using stochastic gradient descent. The properties of this highly non-convex loss function, such as its landscape and the behavior of critical points (maxima, minima, and saddle points), as well as the reason why largeand small-size networks achieve radically different practical performance, are however very poorly understood. It was only recently shown that new results in spin-glass theory potentially may provide an explanation for these problems by establishing a connection between the loss function of the neural networks and the Hamiltonian of the spherical spin-glass models. The connection between both models relies on a number of possibly unrealistic assumptions, yet the empirical evidence suggests that the connection may exist in real. The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.
[1]
Geoffrey E. Hinton,et al.
Rectified Linear Units Improve Restricted Boltzmann Machines
,
2010,
ICML.
[2]
Antonio Auffinger,et al.
Random Matrices and Complexity of Spin Glasses
,
2010,
1003.1129.
[3]
Klaus-Robert Müller,et al.
Efficient BackProp
,
2012,
Neural Networks: Tricks of the Trade.
[4]
Misha Denil,et al.
Predicting Parameters in Deep Learning
,
2014
.
[5]
Surya Ganguli,et al.
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
,
2014,
NIPS.
[6]
Surya Ganguli,et al.
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
,
2013,
ICLR.
[7]
Yann LeCun,et al.
The Loss Surfaces of Multilayer Networks
,
2014,
AISTATS.