暂无分享,去创建一个
[1] Understanding the Role of Momentum in Non-Convex Optimization: Practical Insights from a Lyapunov Analysis , 2020, ArXiv.
[2] Nicholas J. A. Harvey,et al. Tight Analyses for Non-Smooth Stochastic Gradient Descent , 2018, COLT.
[3] Ashok Cutkosky,et al. Anytime Online-to-Batch, Optimism and Acceleration , 2019, ICML.
[4] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[5] Qing Tao,et al. The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods , 2021, ICLR.
[6] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[8] Aaron Defazio,et al. Dual Averaging is Surprisingly Effective for Deep Learning Optimization , 2020, ArXiv.
[9] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[10] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[11] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[12] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[13] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[14] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[17] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[18] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[19] Volkan Cevher,et al. A new regret analysis for Adam-type algorithms , 2020, ICML.
[20] Zhisong Pan,et al. Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence , 2020, IEEE Transactions on Cybernetics.
[21] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[22] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[23] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[24] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[25] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.
[26] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[27] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .
[28] Yu. Nesterov,et al. Quasi-monotone Subgradient Methods for Nonsmooth Convex Minimization , 2015, J. Optim. Theory Appl..
[29] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[30] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.