暂无分享,去创建一个
Rong Jin | Tianbao Yang | Wotao Yin | Zhishuai Guo | Yi Xu | Tianbao Yang | Yi Xu | Rong Jin | W. Yin | Zhishuai Guo
[1] Ashok Cutkosky,et al. Momentum Improves Normalized SGD , 2020, ICML.
[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[3] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[4] Mengdi Wang,et al. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.
[5] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[6] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[7] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[8] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[9] Francis Bach,et al. A Simple Convergence Proof of Adam and Adagrad , 2020 .
[10] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[11] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[12] Mingrui Liu,et al. Adam+: A Stochastic Method with Adaptive Variance Reduction , 2020, ArXiv.
[13] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[14] Li Shen,et al. On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks , 2018, ArXiv.
[15] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[16] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[17] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[18] Lam M. Nguyen,et al. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..
[19] Rong Jin,et al. On Stochastic Moving-Average Estimators for Non-Convex Optimization , 2021, ArXiv.
[20] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[21] Yingbin Liang,et al. SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.
[22] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[23] Wotao Yin,et al. An Improved Analysis of Stochastic Gradient Descent with Momentum , 2020, NeurIPS.
[24] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[25] Mingyi Hong,et al. RMSprop converges with proper hyper-parameter , 2021, ICLR.
[26] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[27] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[28] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[29] Saeed Ghadimi,et al. A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..
[30] Mengdi Wang,et al. Accelerating Stochastic Composition Optimization , 2016, NIPS.
[31] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.