论文信息 - SPI-Optimizer: An Integral-Separated PI Controller for Stochastic Optimization

SPI-Optimizer: An Integral-Separated PI Controller for Stochastic Optimization

To overcome the oscillation problem in the classical momentum-based optimizer, recent work associates it with the proportional-integral (PI) controller, and artificially adds D term producing a PID controller. It suppresses oscillation with the sacrifice of introducing extra hyper-parameter. In this paper, we analyze that the fluctuation problem relates to the lag effect of the integral (I) term, and propose SPI-Optimizer, an integral-Separated PI controller based optimizer WITHOUT introducing extra hyper-parameter. It separates momentum term adaptively when the inconsistency of current and historical gradient direction occurs. Extensive experiments demonstrate that SPI-Optimizer generalizes well on popular network architectures to eliminate the oscillation, and owns competitive performance with faster convergence speed (up to 40% epochs reduction ratio) and more accurate classification result on MNIST, CIFAR10, and CIFAR100 (up to 27.5% error reduction ratio) than state-of-the-art methods.

[1] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[2] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3] Lu Fang,et al. SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5] Katsuhiko Ogata,et al. Discrete-time control systems , 1987 .

[6] P. Sunthar,et al. The generalized proportional-integral-derivative (PID) gradient descent back propagation algorithm , 1995, Neural Networks.

[7] Lars Rundqwist,et al. Integrator Windup and How to Avoid It , 1989, 1989 American Control Conference.

[8] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[9] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[10] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Peter Richtárik,et al. Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory , 2017, SIAM J. Matrix Anal. Appl..

[15] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[16] Qionghai Dai,et al. A PID Controller Approach for Stochastic Optimization of Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20] Lu Fang,et al. Deep Learning for Surface Material Classification Using Haptic and Visual Information , 2015, IEEE Transactions on Multimedia.

[21] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Quoc V. Le,et al. Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[23] H. Robbins. A Stochastic Approximation Method , 1951 .

[24] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .

[25] Daniel Jiwoong Im,et al. An empirical analysis of the optimization of deep network loss surfaces , 2016, 1612.04010.

[26] Peter Richtárik,et al. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.