Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss
暂无分享,去创建一个
Mengjia Xu | Qianli Liao | Andrzej Banburski | Akshay Rangamani | Tomaso | Poggio | Q. Liao | Andrzej Banburski | Mengjia Xu | Akshay Rangamani
[1] Tomaso Poggio,et al. Loss landscape: SGD has a better view , 2020 .
[2] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[3] Hossein Mobahi,et al. Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.
[4] Qianli Liao,et al. Theoretical issues in deep networks , 2020, Proceedings of the National Academy of Sciences.
[5] Yaim Cooper. Global Minima of Overparameterized Neural Networks , 2021, SIAM J. Math. Data Sci..
[6] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[7] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[8] Tomaso Poggio,et al. Loss landscape: SGD can have a better view than GD , 2020 .
[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[10] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[11] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[12] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[13] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[14] Gábor Lugosi,et al. Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.
[15] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[16] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[17] David L. Donoho,et al. Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.
[18] Mert Pilanci,et al. Revealing the Structure of Deep Neural Networks via Convex Duality , 2020 .
[19] Tomaso Poggio,et al. Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .
[20] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[21] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[22] Tomaso Poggio,et al. Generalization in deep network classifiers trained with the square loss1 , 2020 .
[23] Amit Daniely,et al. The Implicit Bias of Depth: How Incremental Learning Drives Generalization , 2020, ICLR.
[24] Benjamin Recht,et al. Interpolating Classifiers Make Few Mistakes , 2021, J. Mach. Learn. Res..
[25] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[26] David G.T. Barrett,et al. Implicit Gradient Regularization , 2020, ArXiv.
[27] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[28] Dustin G. Mixon,et al. Neural collapse with unconstrained features , 2020, Sampling Theory, Signal Processing, and Data Analysis.
[29] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[30] Hangfeng He,et al. Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks , 2021, ArXiv.
[31] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[32] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[33] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[34] E. Weinan,et al. On the emergence of tetrahedral symmetry in the final and penultimate layers of neural network classifiers , 2020, ArXiv.
[35] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[36] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[37] Surya Ganguli,et al. Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics , 2021, ICLR.
[38] Mikhail Belkin,et al. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks , 2020, ICLR.
[39] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.