暂无分享,去创建一个
Yi Zhang | Naman Agarwal | Elad Hazan | Xinyi Chen | Cyril Zhang | Elad Hazan | Naman Agarwal | Xinyi Chen | Cyril Zhang | Yi Zhang
[1] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[2] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[3] Yoram Singer,et al. A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization , 2017, ArXiv.
[4] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[5] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[6] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[7] Xiaoxia Wu,et al. L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .
[8] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[9] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[10] Maya R. Gupta,et al. Training highly multiclass classifiers , 2014, J. Mach. Learn. Res..
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.
[13] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[14] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[15] Quoc V. Le,et al. Neural Optimizer Search with Reinforcement Learning , 2017, ICML.
[16] Jimmy Ba,et al. Kronecker-factored Curvature Approximations for Recurrent Neural Networks , 2018, ICLR.
[17] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[18] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Yi Zhang,et al. The Case for Full-Matrix Adaptive Regularization , 2018, ArXiv.
[23] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.