Linear Convergence of Adaptive Stochastic Gradient Descent
暂无分享,去创建一个
[1] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[2] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[3] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[4] Matthias Hein,et al. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds , 2017, ICML.
[5] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[6] Jorge Nocedal,et al. A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization , 1991, SIAM J. Optim..
[7] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[8] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[9] Mark W. Schmidt,et al. Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition , 2013, 1308.6370.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Anastasios Kyrillidis,et al. Minimum norm solutions do not always generalize well for over-parameterized problems , 2018, ArXiv.
[12] Yi Zhou,et al. SGD Converges to Global Minimum in Deep Learning via Star-convex Path , 2019, ICLR.
[13] Philipp Hennig,et al. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients , 2017, ICML.
[14] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[15] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[16] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[17] Haipeng Luo,et al. Accelerated Parallel Optimization Methods for Large Scale Machine Learning , 2014, ArXiv.
[18] Xiaoxia Wu,et al. WNGrad: Learn the Learning Rate in Gradient Descent , 2018, ArXiv.
[19] Xiaoxia Wu,et al. L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .
[20] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[22] Kfir Y. Levy,et al. Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.
[23] Léon Bottou,et al. Diagonal Rescaling For Neural Networks , 2017, ArXiv.
[24] Anastasios Kyrillidis,et al. Minimum weight norm models do not always generalize well for over-parameterized problems , 2018 .
[25] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[26] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[27] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[28] Simon Haykin,et al. Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.
[29] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[30] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[31] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[32] Volkan Cevher,et al. On the linear convergence of the stochastic gradient method with constant step-size , 2017, Optim. Lett..
[33] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[34] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.
[35] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[36] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[37] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .
[38] Xiaoxia Wu,et al. Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network , 2019, ArXiv.
[39] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[40] Enhong Chen,et al. SADAGRAD: Strongly Adaptive Stochastic Gradient Methods , 2018, ICML.
[41] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[42] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[43] Sanjiv Kumar,et al. Escaping Saddle Points with Adaptive Gradient Methods , 2019, ICML.
[44] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[45] Volkan Cevher,et al. Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.
[46] Raef Bassily,et al. On exponential convergence of SGD in non-convex over-parametrized learning , 2018, ArXiv.
[47] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.