暂无分享,去创建一个
[1] Preetum Nakkiran. Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems , 2020, ArXiv.
[2] Sho Yaida,et al. Fluctuation-dissipation relations for stochastic gradient descent , 2018, ICLR.
[3] Aitor Lewkowycz,et al. On the training dynamics of deep networks with L2 regularization , 2020, NeurIPS.
[4] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[5] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[6] Naman Agarwal,et al. Disentangling Adaptive Gradient Methods from Learning Rates , 2020, ArXiv.
[7] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[8] Sanjeev Arora,et al. Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate , 2020, NeurIPS.
[9] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[10] Hossein Mobahi,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.
[11] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[12] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.
[13] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[14] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[15] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.
[16] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[17] Jian Sun,et al. Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD , 2020 .
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..