Residual Networks: Lyapunov Stability and Convex Decomposition

While training error of most deep neural networks degrades as the depth of the network increases, residual networks appear to be an exception. We show that the main reason for this is the Lyapunov stability of the gradient descent algorithm: for an arbitrarily chosen step size, the equilibria of the gradient descent are most likely to remain stable for the parametrization of residual networks. We then present an architecture with a pair of residual networks to approximate a large class of functions by decomposing them into a convex and a concave part. Some parameters of this model are shown to change little during training, and this imperfect optimization prevents overfitting the data and leads to solutions with small Lipschitz constants, while providing clues about the generalization of other deep networks.

[1]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[2]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[3]  Ruslan Salakhutdinov,et al.  Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[4]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[7]  Jay A. Farrell,et al.  Stability Results for Neural Networks , 1987, NIPS.

[8]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[13]  Zhihong Man,et al.  A New Adaptive Backpropagation Algorithm Based on Lyapunov Stability Theory for Neural Networks , 2006, IEEE Transactions on Neural Networks.

[14]  Kiyotoshi Matsuoka,et al.  Stability conditions for nonlinear continuous neural networks with asymmetric connection weights , 1992, Neural Networks.