LOBAL OPTIMALITY CONDITIONS FOR DEEP NEURAL NETWORKS

We study the error landscape of deep linear and nonlinear neural networks with the squared error loss. Minimizing the loss of a deep linear neural network is a nonconvex problem, and despite recent progress, our understanding of this loss surface is still incomplete. For deep linear networks, we present necessary and sufficient conditions for a critical point of the risk function to be a global minimum. Surprisingly, our conditions provide an efficiently checkable test for global optimality, while such tests are typically intractable in nonconvex optimization. We further extend these results to deep nonlinear neural networks and prove similar sufficient conditions for global optimality, albeit in a more limited function space setting.

[1]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[2]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[5]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[6]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[7]  X H Yu,et al.  On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.

[8]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[9]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[10]  Daniel Soudry,et al.  No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[13]  Haihao Lu,et al.  Depth Creates No Bad Local Minima , 2017, ArXiv.

[14]  G. Zames On the input-output stability of time-varying nonlinear feedback systems Part one: Conditions derived using concepts of loop gain, conicity, and positivity , 1966 .

[15]  Le Song,et al.  Diverse Neural Network Learns True Target Functions , 2016, AISTATS.

[16]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.