Lookahead Converges to Stationary Points of Smooth Non-convex Functions
暂无分享,去创建一个
Jianyu Wang | Nicolas Ballas | Michael Rabbat | Vinayak Tantia | Michael G. Rabbat | Nicolas Ballas | Jianyu Wang | Vinayak Tantia
[1] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[2] Ali H. Sayed,et al. On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.
[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[4] Michael G. Rabbat,et al. Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[5] Ali Sayed,et al. Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..
[6] Ali H. Sayed,et al. Distributed Learning in Non-Convex Environments—Part I: Agreement at a Linear Rate , 2019, IEEE Transactions on Signal Processing.
[7] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[8] Michael G. Rabbat,et al. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.
[9] Angelia Nedic,et al. Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.
[10] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[14] Jianyu Wang,et al. SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum , 2020, ICLR.
[15] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[16] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..