暂无分享,去创建一个
[1] Understanding the Role of Momentum in Non-Convex Optimization: Practical Insights from a Lyapunov Analysis , 2020, ArXiv.
[2] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Aaron Defazio,et al. The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization , 2020 .
[5] Francis Bach,et al. A Simple Convergence Proof of Adam and Adagrad , 2020 .
[6] Aaron Defazio. Offset Sampling Improves Deep Learning based Accelerated MRI Reconstructions by Exploiting Symmetry , 2019 .
[7] Aaron Defazio,et al. On the convergence of the Stochastic Heavy Ball Method , 2020, ArXiv.
[8] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[9] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[10] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[11] Yu. Nesterov,et al. Quasi-monotone Subgradient Methods for Nonsmooth Convex Minimization , 2015, J. Optim. Theory Appl..
[12] Philipp Hennig,et al. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients , 2017, ICML.
[13] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[14] Li Shen,et al. Weighted AdaGrad with Unified Momentum , 2018 .
[15] Aaron Defazio,et al. End-to-End Variational Networks for Accelerated MRI Reconstruction , 2020, MICCAI.
[16] Y. Nesterov. Primal-Dual Subgradient Methods for Convex Problems , 2005 .
[17] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[18] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[19] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[20] Jaehoon Lee,et al. On Empirical Comparisons of Optimizers for Deep Learning , 2019, ArXiv.
[21] Volkan Cevher,et al. Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.
[22] Pascal Vincent,et al. fastMRI: An Open Dataset and Benchmarks for Accelerated MRI , 2018, ArXiv.
[23] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[24] Marcello Federico,et al. Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.
[25] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[26] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[27] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[28] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[29] Alexander M. Rush,et al. Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.
[32] Philipp Hennig,et al. Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers , 2020, ICML.
[33] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).