SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
暂无分享,去创建一个
Chunpeng Wu | Wei Wen | Yiran Chen | Feng Yan | Hai Li | Cong Xu | Yandan Wang | W. Wen | Cong Xu | Chunpeng Wu | Yandan Wang | Yiran Chen | H. Li | Feng Yan
[1] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[2] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[3] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[4] F. Liu,et al. Adaptive Gaussian Noise Injection Regularization for Neural Networks , 2016, ISNN.
[5] Geoffrey E. Hinton,et al. Keeping Neural Networks Simple , 1993 .
[6] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[7] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[8] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[9] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[10] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[11] Andrew Chi-Sing Leung,et al. On Weight-Noise-Injection Training , 2009, ICONIP.
[12] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..
[13] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[14] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[15] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[16] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[19] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[22] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[23] J. Bouchaud,et al. Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications , 1990 .
[24] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[25] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[26] J. Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..
[27] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[29] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[30] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[31] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[32] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[33] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[34] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[35] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[36] Yoshua Bengio,et al. Finding Flatter Minima with SGD , 2018, ICLR.
[37] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[38] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[39] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[40] Erich Elsen,et al. Persistent RNNs: Stashing Recurrent Weights On-Chip , 2016, ICML.