A Simple Convergence Proof of Adam and Adagrad
暂无分享,去创建一个
[1] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[2] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[3] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[4] Michael I. Jordan,et al. Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.
[5] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[6] Li Shen,et al. Weighted AdaGrad with Unified Momentum , 2018 .
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[9] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.
[10] Boris Ginsburg,et al. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification , 2017, ArXiv.
[11] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[12] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.
[13] Nicolas Usunier,et al. Canonical Tensor Decomposition for Knowledge Base Completion , 2018, ICML.
[14] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[15] Diego Klabjan,et al. Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network , 2019, ArXiv.
[16] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[18] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .