RMSprop converges with proper hyper-parameter
暂无分享,去创建一个
Mingyi Hong | Ruoyu Sun | Naichen Shi | Dawei Li | Mingyi Hong | Naichen Shi | Ruoyu Sun | Dawei Li
[1] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[2] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[3] Alexia Jolicoeur-Martineau,et al. The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.
[4] Bin Dong,et al. Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate , 2019, IJCAI.
[5] Soham De,et al. Convergence Guarantees for RMSProp and ADAM in Non-Convex Optimization and an Empirical Comparison to Nesterov Acceleration , 2018, 1807.06766.
[6] Oliver Wang,et al. MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[8] Wotao Yin,et al. An Improved Analysis of Stochastic Gradient Descent with Momentum , 2020, NeurIPS.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[11] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[12] Sepp Hochreiter,et al. First Order Generative Adversarial Networks , 2018, ICML.
[13] Stefan Winkler,et al. The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.
[14] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[15] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[16] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[17] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Francis Bach,et al. On the Convergence of Adam and Adagrad , 2020, ArXiv.
[19] Yong Yu,et al. AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods , 2018, ICLR.
[20] Francis Bach,et al. Explicit Regularization of Stochastic Gradient Methods through Duality , 2020, AISTATS.
[21] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[22] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.