暂无分享,去创建一个
Mingrui Liu | Tianbao Yang | Wei Zhang | Francesco Orabona | Mingrui Liu | Tianbao Yang | Francesco Orabona | W. Zhang
[1] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[4] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[5] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[6] Ashok Cutkosky,et al. Momentum Improves Normalized SGD , 2020, ICML.
[7] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[8] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[9] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[10] James Demmel,et al. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes , 2019, ArXiv.
[11] Xiaodong Cui,et al. Distributed Deep Learning Strategies for Automatic Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Mengdi Wang,et al. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.
[13] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[16] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[17] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.
[18] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[19] Enhong Chen,et al. Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions , 2018, ICLR.
[20] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..
[21] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[22] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[23] Lam M. Nguyen,et al. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..
[24] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[25] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[26] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[27] Suvrit Sra,et al. On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions , 2020, ArXiv.
[28] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[29] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[30] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[33] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[34] Zhouchen Lin,et al. Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.
[35] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.
[36] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[37] Kfir Y. Levy,et al. Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.
[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[40] Yingbin Liang,et al. SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.