暂无分享,去创建一个
Rachel Ward | Xiaoxia Wu | Yuege Xie | Simon Du | S. Du | Rachel A. Ward | Xiaoxia Wu | Yuege Xie
[1] Dewa Made Sri Arsa,et al. Fake News Dataset , 2021 .
[2] Francesco Orabona,et al. Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.
[3] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[4] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[5] Matthias Hein,et al. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds , 2017, ICML.
[6] Francis Bach,et al. On the Convergence of Adam and Adagrad , 2020, ArXiv.
[7] Xiaoxia Wu,et al. Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network , 2019, ArXiv.
[8] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[9] Xiaoxia Wu,et al. Linear Convergence of Adaptive Stochastic Gradient Descent , 2019, AISTATS.
[10] Xiaoxia Wu,et al. WNGrad: Learn the Learning Rate in Gradient Descent , 2018, ArXiv.
[11] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[12] Shiqian Ma,et al. Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.
[13] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jorge Nocedal,et al. A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization , 1991, SIAM J. Optim..
[15] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[16] Kfir Y. Levy,et al. Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.
[17] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[19] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[20] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[21] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[22] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .
[23] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[24] Volkan Cevher,et al. Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.
[25] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[26] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[27] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[28] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[29] Simon Haykin,et al. Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.
[30] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[32] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[33] Alexandros G. Dimakis,et al. Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification , 2018, MLSys.
[34] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.