Why flatness does and does not correlate with generalization for deep neural networks
暂无分享,去创建一个
Ard Louis | Shuofeng Zhang | Isaac Reid | Guillermo Valle P'erez | A. Louis | Isaac Reid | Shuo Zhang
[1] Zhanxing Zhu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[2] Yoshua Bengio,et al. Finding Flatter Minima with SGD , 2018, ICLR.
[3] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[4] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[5] Cristian Sminchisescu,et al. A Reparameterization-Invariant Flatness Measure for Deep Neural Networks , 2019, ArXiv.
[6] Geoffrey E. Hinton,et al. Keeping Neural Networks Simple , 1993 .
[7] Ard A. Louis,et al. Is SGD a Bayesian sampler? Well, almost , 2020, ArXiv.
[8] Yao Zhang,et al. Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning , 2018, Molecular Physics.
[9] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[10] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[11] Masashi Sugiyama,et al. Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis , 2019, ICML.
[12] Trac D. Tran,et al. A Scale Invariant Flatness Measure for Deep Network Minima , 2019, ArXiv.
[13] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[14] David J Schwab,et al. How noise affects the Hessian spectrum in overparameterized neural networks , 2019, ArXiv.
[15] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[16] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[17] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[18] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.