暂无分享,去创建一个
Suvrit Sra | Ali Jadbabaie | Jingzhao Zhang | Haochuan Li | A. Jadbabaie | S. Sra | Haochuan Li | Jingzhao Zhang
[1] Alistair Letcher,et al. On the Impossibility of Global Convergence in Multi-Loss Optimization , 2020, ICLR.
[2] Umut Simsekli,et al. The Heavy-Tail Phenomenon in SGD , 2020, ArXiv.
[3] Ross Wightman,et al. ResNet strikes back: An improved training procedure in timm , 2021, ArXiv.
[4] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[5] Léon Bottou,et al. On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.
[6] Ankit Singh Rawat,et al. On the Reproducibility of Neural Network Predictions , 2021, ArXiv.
[7] Surya Ganguli,et al. The Limiting Dynamics of SGD: Modified Loss, Phase-Space Oscillations, and Anomalous Diffusion , 2021, Neural Comput..
[8] Suvrit Sra,et al. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.
[9] Pranava Madhyastha,et al. On Model Stability as a Function of Random Seed , 2019, CoNLL.
[10] Sanjeev Arora,et al. Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate , 2020, NeurIPS.
[11] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[12] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[13] Andrey Malinin,et al. On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay , 2021, NeurIPS.
[14] Yun Kuen Cheung,et al. Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games , 2019, COLT.
[15] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[16] Georgios Piliouras,et al. No-regret learning and mixed Nash equilibria: They do not mix , 2020, NeurIPS.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Georgios Piliouras,et al. Game dynamics as the meaning of a game , 2019, SECO.
[19] Ameet Talwalkar,et al. Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability , 2021, ICLR.
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Panayotis Mertikopoulos,et al. On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.
[22] Michael I. Jordan,et al. Stochastic Gradient and Langevin Processes , 2019, ICML.
[23] A. Jadbabaie,et al. On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions , 2020, arXiv.org.