On Monotonic Linear Interpolation of Neural Network Parameters
暂无分享,去创建一个
Richard S. Zemel | Roger B. Grosse | Stanislav Fort | Juhan Bae | Michael R. Zhang | James Lucas | R. Zemel | J. Lucas | Juhan Bae | Stanislav Fort | R. Grosse
[1] Gintare Karolina Dziugaite,et al. Stabilizing the Lottery Ticket Hypothesis , 2019 .
[2] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[3] Guodong Zhang,et al. Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model , 2019, NeurIPS.
[4] Gilad Yehudai,et al. Proving the Lottery Ticket Hypothesis: Pruning is All You Need , 2020, ICML.
[5] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[6] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[8] Gintare Karolina Dziugaite,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.
[9] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[10] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[11] Shun-ichi Amari,et al. When Does Preconditioning Help or Hurt Generalization? , 2021, ICLR.
[12] Stanislav Fort,et al. The Goldilocks zone: Towards better understanding of neural network loss landscapes , 2018, AAAI.
[13] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[14] Marco Mondelli,et al. Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks , 2020, ICML.
[15] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[16] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[17] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[18] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[19] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[20] Sanjeev Arora,et al. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets , 2019, NeurIPS.
[21] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[22] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[23] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[24] Surya Ganguli,et al. Emergent properties of the local geometry of neural loss landscapes , 2019, ArXiv.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[28] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[29] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[30] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[31] Renjie Liao,et al. Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.
[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[33] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[34] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.