ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
暂无分享,去创建一个
Jungmin Kwon | Jeongseop Kim | Hyunseo Park | In Kwon Choi | Jungmin Kwon | Hyunseong Park | Jeongseop Kim | I. Choi
[1] Jeffrey Uhlmann. A Generalized Matrix Inverse That Is Consistent with Respect to Diagonal Transformations , 2018, SIAM J. Matrix Anal. Appl..
[2] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[3] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[4] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[5] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .
[6] Aditya Krishna Menon,et al. Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.
[7] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.
[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[10] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[11] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[13] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[16] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Maher Nouiehed,et al. SALR: Sharpness-aware Learning Rates for Improved Generalization , 2020, ArXiv.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Tie-Yan Liu,et al. Positively Scale-Invariant Flatness of ReLU Neural Networks , 2019, ArXiv.
[20] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.
[23] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[24] Masashi Sugiyama,et al. Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis , 2019, ICML.
[25] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[26] Hossein Mobahi,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.
[27] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.
[28] Weilong Yang,et al. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2019, ICML.
[29] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[30] Xu Sun,et al. Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption , 2020, ArXiv.
[31] Behnam Neyshabur,et al. The intriguing role of module criticality in the generalization of deep networks , 2020, ICLR.
[32] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[33] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] M. Kendall. A NEW MEASURE OF RANK CORRELATION , 1938 .
[36] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[37] Shun-ichi Amari,et al. The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks , 2019, NeurIPS.
[38] Takuya Akiba,et al. Shakedrop Regularization for Deep Residual Learning , 2018, IEEE Access.
[39] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.