Deep Classifiers trained with the Square Loss
暂无分享,去创建一个
T. Poggio | Q. Liao | Andrzej Banburski | Mengjia Xu | Akshay Rangamani | Tomer | Galanti
[1] T. Poggio,et al. SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks , 2022, ArXiv.
[2] S. Chatterjee. Convergence of gradient descent for deep neural networks , 2022, ArXiv.
[3] Zhihui Zhu,et al. On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features , 2022, ICML.
[4] O. Shamir,et al. Implicit Regularization Towards Rank Minimization in ReLU Networks , 2022, ALT.
[5] X. Y. Han,et al. Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path , 2021, ICLR.
[6] Dustin G. Mixon,et al. Neural collapse with unconstrained features , 2020, Sampling Theory, Signal Processing, and Data Analysis.
[7] Zhihui Zhu,et al. A Geometric Analysis of Neural Collapse with Unconstrained Features , 2021, NeurIPS.
[8] Benjamin Recht,et al. Interpolating Classifiers Make Few Mistakes , 2021, J. Mach. Learn. Res..
[9] Surya Ganguli,et al. Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics , 2020, ICLR.
[10] D. Barrett,et al. Implicit Gradient Regularization , 2020, ICLR.
[11] Mikhail Belkin,et al. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks , 2020, ICLR.
[12] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[13] Mert Pilanci,et al. Revealing the Structure of Deep Neural Networks via Convex Duality , 2020, ICML.
[14] Dacheng Tao,et al. Orthogonal Deep Neural Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Yaim Cooper. Global Minima of Overparameterized Neural Networks , 2021, SIAM J. Math. Data Sci..
[16] Hangfeng He,et al. Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks , 2021, ArXiv.
[17] Stefan Steinerberger,et al. Neural Collapse with Cross-Entropy Loss , 2020, ArXiv.
[18] E. Weinan,et al. On the emergence of tetrahedral symmetry in the final and penultimate layers of neural network classifiers , 2020, ArXiv.
[19] Grant M. Rotskoff,et al. A Dynamical Central Limit Theorem for Shallow Neural Networks , 2020, NeurIPS.
[20] David L. Donoho,et al. Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.
[21] Qianli Liao,et al. Theoretical issues in deep networks , 2020, Proceedings of the National Academy of Sciences.
[22] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[23] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[24] S. Shalev-Shwartz,et al. The Implicit Bias of Depth: How Incremental Learning Drives Generalization , 2019, ICLR.
[25] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[26] Tomaso Poggio,et al. Generalization in deep network classifiers trained with the square loss1 , 2020 .
[27] Tomaso Poggio,et al. Loss landscape: SGD has a better view , 2020 .
[28] Tomaso Poggio,et al. Loss landscape: SGD can have a better view than GD , 2020 .
[29] Marius Kloft,et al. Improved Generalisation Bounds for Deep Learning Through L∞ Covering Numbers , 2019, ArXiv.
[30] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.
[31] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[32] Daniel Kunin,et al. Loss Landscapes of Regularized Linear Autoencoders , 2019, ICML.
[33] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[34] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[35] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[36] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[37] Hossein Mobahi,et al. Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.
[38] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[39] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[40] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[41] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[42] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[43] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[44] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[45] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[46] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[47] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[48] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[49] Lorenzo Rosasco,et al. On Invariance and Selectivity in Representation Learning , 2015, ArXiv.
[50] A. Blum. 10-806 Foundations of Machine Learning and Data Science , 2015 .
[51] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.
[52] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[53] Gábor Lugosi,et al. Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.
[54] T. Poggio,et al. Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization , 2002 .
[55] Tomaso Poggio,et al. Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .
[56] Peter L. Bartlett,et al. The importance of convexity in learning with squared loss , 1998, COLT '96.
[57] P. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .