Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond
暂无分享,去创建一个
[1] Liang Lin,et al. Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition , 2021, EMNLP.
[2] Xiao-Tong Yuan,et al. A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning , 2021, NeurIPS.
[3] Sheng Huang,et al. Weakly Supervised Patch Label Inference Network with Image Pyramid for Pavement Diseases Recognition in the Wild , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Mayank Goswami,et al. Stability of SGD: Tightness Analysis and Improved Bounds , 2021, UAI.
[5] Hao Li,et al. AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy , 2020, ArXiv.
[6] J. Duncan,et al. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , 2020, NeurIPS.
[7] Pan Zhou,et al. Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning , 2020, NeurIPS.
[8] Pan Zhou,et al. Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization , 2020, ICML.
[9] R. Socher,et al. Theory-Inspired Path-Regularized Differential Network Architecture Search , 2020, NeurIPS.
[10] Junnan Li,et al. Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.
[11] Jianyu Wang,et al. Lookahead Converges to Stationary Points of Smooth Non-convex Functions , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Aditya Ganeshan,et al. Meta-Learning Extractors for Music Source Separation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[14] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[15] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[16] Yang Yuan,et al. Asymmetric Valleys: Beyond Sharp and Flat Local Minima , 2019, NeurIPS.
[17] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[18] Yan Yan,et al. Stagewise Training Accelerates Convergence of Testing Error Over SGD , 2018, NeurIPS.
[19] Pan Zhou,et al. Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[21] Jiashi Feng,et al. Understanding Generalization and Optimization Performance of Deep CNNs , 2018, ICML.
[22] Jiashi Feng,et al. Empirical Risk Landscape Analysis for Understanding Deep Neural Networks , 2018, ICLR.
[23] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[24] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[25] Dimitris S. Papailiopoulos,et al. Stability and Generalization of Learning Algorithms that Converge to Global Optima , 2017, ICML.
[26] Yi Zhou,et al. Characterization of Gradient Dominance and Regularity Conditions for Neural Networks , 2017, ArXiv.
[27] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[28] Changjiang Zhang,et al. An improved Adam Algorithm using look-ahead , 2017, ICDLT '17.
[29] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[30] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.
[31] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[32] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[33] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[34] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.
[35] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[36] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Paul W. Fieguth,et al. Stage-wise Training: An Improved Feature Learning Strategy for Deep Models , 2015, FE@NIPS.
[40] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[43] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[45] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[46] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[47] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[48] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[49] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[50] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[51] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[52] H. Robbins. A Stochastic Approximation Method , 1951 .
[53] Caiming Xiong,et al. Task similarity aware meta learning: theory-inspired improvement on MAML , 2021, UAI.
[54] Nenghai Yu,et al. A Simple Baseline for StyleGAN Inversion , 2021, ArXiv.
[55] Shuicheng Yan,et al. Efficient Meta Learning via Minibatch Proximal Update , 2019, NeurIPS.
[56] Jiashi Feng,et al. Efficient Stochastic Gradient Hard Thresholding , 2018, NeurIPS.
[57] Jiashi Feng,et al. New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity , 2018, NeurIPS.
[58] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[59] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.