The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure
暂无分享,去创建一个
Sham M. Kakade | Praneeth Netrapalli | Rong Ge | Rahul Kidambi | S. Kakade | Praneeth Netrapalli | Rahul Kidambi | Rong Ge
[1] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[2] J. Nagumo,et al. A learning method for system identification , 1967, IEEE Transactions on Automatic Control.
[3] F. Downton. Stochastic Approximation , 1969, Nature.
[4] M. T. Wasan. Stochastic Approximation , 1969 .
[5] D. Anbar. On Optimal Estimation Methods Using Stochastic Approximation Procedures , 1973 .
[6] J. Proakis,et al. Channel identification for high speed digital communications , 1974 .
[7] Jean-Louis Goffin,et al. On convergence rates of subgradient optimization methods , 1977, Math. Program..
[8] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[9] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[10] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[11] Bernard Widrow,et al. Adaptive Signal Processing , 1985 .
[12] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[13] GeorgeA. Silver. Switzerland , 1989, The Lancet.
[14] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[15] John J. Shynk,et al. Analysis of the momentum LMS algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..
[16] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[17] G. Pflug,et al. Stochastic approximation and optimization of random systems , 1992 .
[18] William A. Sethares,et al. Analysis of momentum adaptive filtering algorithms , 1998, IEEE Trans. Signal Process..
[19] Aravaipa Canyon Basin,et al. Volume 3 , 2012, Journal of Diabetes Investigation.
[20] Vivek S. Borkar,et al. Stochastic approximation algorithms: Overview and recent trends , 1999 .
[21] T. Lai. Stochastic approximation: invited paper , 2003 .
[22] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[23] H. Robbins. A Stochastic Approximation Method , 1951 .
[24] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[25] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[26] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[27] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[28] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.
[29] Ohad Shamir,et al. Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent? , 2012, COLT.
[30] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[31] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[33] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[34] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[35] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.
[36] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[37] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[38] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..
[39] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[40] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[41] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[42] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[43] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.
[44] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..
[45] Francis R. Bach,et al. Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2015, AISTATS.
[46] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[48] G. Casella,et al. Springer Texts in Statistics , 2016 .
[49] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Tianbao Yang,et al. Accelerate Stochastic Subgradient Method by Leveraging Local Error Bound , 2016, ArXiv.
[51] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[52] A. Hall,et al. Adaptive Switching Circuits , 2016 .
[53] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[54] David C. Paris. Better , 2017 .
[55] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, ArXiv.
[56] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
[57] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..
[58] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[59] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.
[60] Lorenzo Rosasco,et al. Iterate averaging as regularization for stochastic gradient descent , 2018, COLT.
[61] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[62] Prateek Jain,et al. On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization , 2018, 2018 Information Theory and Applications Workshop (ITA).
[63] Rong Jin,et al. Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD? , 2018, ArXiv.
[64] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically , 2018, NIPS 2018.
[65] Prateek Jain,et al. Making the Last Iterate of SGD Information Theoretically Optimal , 2019, COLT.
[66] Dmitriy Drusvyatskiy,et al. Robust stochastic optimization with the proximal point method , 2019, ArXiv.
[67] Nicholas J. A. Harvey,et al. Tight Analyses for Non-Smooth Stochastic Gradient Descent , 2018, COLT.
[68] Dmitriy Drusvyatskiy,et al. Stochastic algorithms with geometric step decay converge linearly on sharp functions , 2019, Mathematical Programming.
[69] Asuman E. Ozdaglar,et al. A Universally Optimal Multistage Accelerated Stochastic Gradient Method , 2019, NeurIPS.
[70] Julien Mairal,et al. A Generic Acceleration Framework for Stochastic Composite Optimization , 2019, NeurIPS.