SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks
暂无分享,去创建一个
[1] Hermann Ney,et al. Mean-normalized stochastic gradient for large-scale deep learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Tomaso Poggio,et al. Generalization in deep network classifiers trained with the square loss1 , 2020 .
[3] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[4] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[5] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Matus Telgarsky,et al. Directional convergence and alignment in deep learning , 2020, NeurIPS.
[8] Murad Tukan,et al. No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks , 2021, Sensors.
[9] Kaifeng Lyu,et al. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning , 2020, ICLR.
[10] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[11] Zhiyuan Zhang,et al. Understanding and Improving Layer Normalization , 2019, NeurIPS.
[12] Sanjeev Arora,et al. What Happens after SGD Reaches Zero Loss? -A Mathematical Framework , 2021, ICLR.
[13] Mathieu Salzmann,et al. Compression-aware Training of Deep Networks , 2017, NIPS.
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[16] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[17] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[18] G. Montavon,et al. XAI for Transformers: Better Explanations through Conservative Propagation , 2022, ICML.
[19] Nadav Cohen,et al. Implicit Regularization in Deep Learning May Not Be Explainable by Norms , 2020, NeurIPS.
[20] Pulkit Agrawal,et al. The Low-Rank Simplicity Bias in Deep Networks , 2021, ArXiv.
[21] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[22] Bernhard Pfahringer,et al. Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.
[23] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[24] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[25] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[26] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[27] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Yoshua Bengio,et al. Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.
[29] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[30] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[31] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[32] Qianli Liao,et al. Theoretical issues in deep networks , 2020, Proceedings of the National Academy of Sciences.
[33] Yahya Al-Hazmi,et al. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2014, ICPP 2014.
[34] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[35] O. Shamir,et al. Implicit Regularization Towards Rank Minimization in ReLU Networks , 2022, ALT.
[36] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.