A Smooth Optimisation Perspective on Training Feedforward Neural Networks

We present a smooth optimisation perspective on training multilayer Feedforward Neural Networks (FNNs) in the supervised learning setting. By characterising the critical point conditions of an FNN based optimisation problem, we identify the conditions to eliminate local optima of the cost function. By studying the Hessian structure of the cost function at the global minima, we develop an approximate Newton FNN algorithm, which demonstrates promising convergence properties.

[1]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[2]  C. Charalambous,et al.  Conjugate gradient algorithm for efficient training of artifi-cial neural networks , 1990 .

[3]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[4]  Tie-Yan Liu,et al.  On the Depth of Deep Neural Networks: A Theoretical View , 2015, AAAI.

[5]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[6]  Charles A. Micchelli,et al.  How to Choose an Activation Function , 1993, NIPS.

[7]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  X H Yu,et al.  On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.

[11]  Chin-Teng Lin,et al.  A second-order learning algorithm for multilayer networks based on block Hessian matrix , 1998, Neural Networks.

[12]  Andreas Stafylopatis,et al.  The impact of the error function selection in neural network-based classifiers , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[13]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[14]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[15]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[16]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[17]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[18]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.