论文信息 - Understanding the Role of Momentum in Non-Convex Optimization: Practical Insights from a Lyapunov Analysis

Understanding the Role of Momentum in Non-Convex Optimization: Practical Insights from a Lyapunov Analysis

Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks. Empirically, they out perform traditional stochastic gradient descent (SGD) approaches. In this work we develop a Lyapunov analysis of SGD with momentum (SGD+M), by utilizing a equivalent rewriting of the method known as the stochastic primal averaging (SPA) form. This analysis is much tighter than previous theory in the non-convex case, and due to this we are able to give precise insights into when SGD+M may out-perform SGD, and what hyper-parameter schedules will work and why.

Aaron Defazio

[1] Ashok Cutkosky,et al. Momentum Improves Normalized SGD , 2020, ICML.

[2] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.

[3] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[4] Prateek Jain,et al. Making the Last Iterate of SGD Information Theoretically Optimal , 2019, COLT.

[5] Yi Yang,et al. A Unified Analysis of Stochastic Momentum Methods for Deep Learning , 2018, IJCAI.

[6] Aaron Defazio,et al. On the convergence of the Stochastic Heavy Ball Method , 2020, ArXiv.

[7] Ali H. Sayed,et al. On the influence of momentum acceleration on online learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[9] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[10] Aaron Defazio,et al. On the Curved Geometry of Accelerated Optimization , 2018, NeurIPS.

[11] Zhisong Pan,et al. Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence , 2020, IEEE Transactions on Cybernetics.