Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Training with stochastic gradient descent and variants requires careful parameter tuning and provides no guarantee to achieve the global optimum. In contrast we show under quite weak assumptions on the data that a particular class of feedforward neural networks can be trained globally optimal with a linear convergence rate with our nonlinear spectral method. Up to our knowledge this is the first practically feasible method which achieves such a guarantee. While the method can in principle be applied to deep networks, we restrict ourselves for simplicity in this paper to one and two hidden layer networks. Our experiments confirm that these models are rich enough to achieve good performance on a series of real-world datasets.

[1]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[2]  W. A. Kirk,et al.  An Introduction to Metric Spaces and Fixed Point Theory , 2001 .

[3]  Matthias Hein,et al.  The Perron-Frobenius theorem for multi-homogeneous maps , 2017, 1702.03230.

[4]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[5]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[6]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[7]  Bas Lemmens,et al.  Nonlinear Perron-Frobenius Theory , 2012 .

[8]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[9]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[10]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[11]  A. C. Thompson ON CERTAIN CONTRACTION MAPPINGS IN A PARTIALLY ORDERED VECTOR SPACE , 1963 .

[12]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[13]  Jirí Síma,et al.  Training a Single Sigmoidal Neuron Is Hard , 2002, Neural Comput..

[14]  Yoram Singer,et al.  Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.

[15]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[18]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .