On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization
暂无分享,去创建一个
Prateek Jain | Sham M. Kakade | Praneeth Netrapalli | Rahul Kidambi | S. Kakade | Prateek Jain | Praneeth Netrapalli | Rahul Kidambi
[1] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[2] Christopher C. Paige,et al. The computation of eigenvalues and eigenvectors of very large sparse matrices , 1971 .
[3] J. Proakis,et al. Channel identification for high speed digital communications , 1974 .
[4] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[5] A. Greenbaum. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .
[6] John J. Shynk,et al. Analysis of the momentum LMS algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..
[7] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.
[8] William A. Sethares,et al. Analysis of momentum adaptive filtering algorithms , 1998, IEEE Trans. Signal Process..
[9] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[10] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[11] H. Robbins. A Stochastic Approximation Method , 1951 .
[12] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[13] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .
[14] Alexandre d'Aspremont,et al. Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..
[15] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[18] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[19] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[20] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.
[21] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[22] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[23] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..
[24] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[25] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[26] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[27] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[28] Yurii Nesterov,et al. First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.
[29] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[30] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.
[33] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[34] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[37] Aaron Defazio,et al. A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.
[38] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[39] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Math. Program..
[40] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[41] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, ArXiv.
[42] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[43] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[44] Peter Richtárik,et al. Linearly convergent stochastic heavy ball method for minimizing generalization error , 2017, ArXiv.
[45] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.
[46] Ioannis Mitliagkas,et al. YellowFin and the Art of Momentum Tuning , 2017, MLSys.