暂无分享,去创建一个
Liangli Zhen | Jiawei Du | Jiashi Feng | Joey Tianyi Zhou | Rick Siow Mong Goh | Hanshu Yan | Vincent Y. F. Tan
[1] Roderick Karlemstrand,et al. Using Early-Learning Regularization to Classify Real-World Noisy Data , 2021, ArXiv.
[2] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[3] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[4] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[7] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[8] Richong Zhang,et al. Regularizing Neural Networks via Adversarial Model Perturbation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Rong Jin,et al. Scaled ReLU Matters for Training Vision Transformers , 2021, AAAI.
[10] Yu Liu,et al. Gradient Harmonized Single-stage Detector , 2018, AAAI.
[11] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[12] Cho-Jui Hsieh,et al. When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations , 2021, ArXiv.
[13] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[14] Hossein Mobahi,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.
[15] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[16] Seyed-Mohsen Moosavi-Dezfooli,et al. Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.
[18] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[19] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Jungmin Kwon,et al. ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks , 2021, ICML.
[22] Tengyu Ma,et al. Label Noise SGD Provably Prefers Flat Global Minimizers , 2021, NeurIPS.
[23] Quoc V. Le,et al. Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yisen Wang,et al. Adversarial Weight Perturbation Helps Robust Generalization , 2020, NeurIPS.
[25] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Tao Lin,et al. On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them , 2020, NeurIPS.
[27] Samy Bengio,et al. Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.