SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks
暂无分享,去创建一个
[1] Arthur Jacot,et al. Feature Learning in L2-regularized DNNs: Attraction/Repulsion and Sparsity , 2022, NeurIPS.
[2] R. Willett,et al. The Role of Linear Layers in Nonlinear Interpolating Networks , 2022, ArXiv.
[3] O. Shamir,et al. Implicit Regularization Towards Rank Minimization in ReLU Networks , 2022, ALT.
[4] S. Jegelka,et al. Training invariances and the low-rank phenomenon: beyond linear networks , 2022, ICLR.
[5] S. Dekel,et al. Nearest Class-Center Simplification through Intermediate Layers , 2022, TAG-ML.
[6] Yuri Burda,et al. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , 2022, ArXiv.
[7] Murad Tukan,et al. No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks , 2021, Sensors.
[8] Pulkit Agrawal,et al. The Low-Rank Simplicity Bias in Deep Networks , 2021, Trans. Mach. Learn. Res..
[9] Kaifeng Lyu,et al. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning , 2020, ICLR.
[10] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[11] David L. Donoho,et al. Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.
[12] Matus Telgarsky,et al. Directional convergence and alignment in deep learning , 2020, NeurIPS.
[13] Nadav Cohen,et al. Implicit Regularization in Deep Learning May Not Be Explainable by Norms , 2020, NeurIPS.
[14] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[15] Mert Pilanci,et al. Revealing the Structure of Deep Neural Networks via Convex Duality , 2020, ICML.
[16] Nathan Srebro,et al. A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case , 2019, ICLR.
[17] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.
[18] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[19] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[20] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[21] Mathieu Salzmann,et al. Compression-aware Training of Deep Networks , 2017, NIPS.
[22] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[24] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[25] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[26] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[27] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[30] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[31] Y. T. Zhou,et al. Computation of optical flow using a neural network , 1988, IEEE 1988 International Conference on Neural Networks.
[32] T. Poggio,et al. Feature learning in deep classifiers through Intermediate Neural Collapse , 2023, ICML.
[33] Tomer Galanti,et al. On the Implicit Bias Towards Depth Minimization in Deep Neural Networks , 2022 .
[34] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[35] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[36] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[37] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .