暂无分享,去创建一个
Yoshua Bengio | Mohammad Pezeshki | Guillaume Lajoie | Amartya Mitra | Yoshua Bengio | M. Pezeshki | Guillaume Lajoie | Amartya Mitra
[1] Edgar Dobriban,et al. The Implicit Regularization of Stochastic Gradient Flow for Least Squares , 2020, ICML.
[2] S. Bös. STATISTICAL MECHANICS APPROACH TO EARLY STOPPING AND WEIGHT DECAY , 1998 .
[3] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[4] Jason D. Lee,et al. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks , 2019, ICLR.
[5] Florent Krzakala,et al. Statistical physics-based reconstruction in compressed sensing , 2011, ArXiv.
[6] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[7] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[8] Yoshua Bengio,et al. On the Spectral Bias of Neural Networks , 2018, ICML.
[9] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[10] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[11] Aaron C. Courville,et al. Gradient Starvation: A Learning Proclivity in Neural Networks , 2020, NeurIPS.
[12] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[13] Cory Stephenson,et al. When and how epochwise double descent happens , 2021, ArXiv.
[14] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[15] E. Gardner,et al. Three unfinished works on the optimal storage capacity of networks , 1989 .
[16] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[17] Florent Krzakala,et al. The Gaussian equivalence of generative models for learning with two-layer neural networks , 2020, ArXiv.
[18] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[19] R Kuhnt. Statistical mechanics for neural networks with continuous-time dynamics , 1992 .
[20] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[21] Florent Krzakala,et al. Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.
[22] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.
[23] J. Hertz,et al. Generalization in a linear perceptron in the presence of noise , 1992 .
[24] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[25] Manfred Opper,et al. Statistical mechanics of generalization , 1998 .
[26] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[27] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[28] Reinhard Heckel,et al. Early Stopping in Deep Networks: Double Descent and How to Eliminate it , 2021, ICLR.
[29] Ioannis Mitliagkas,et al. A Modern Take on the Bias-Variance Tradeoff in Neural Networks , 2018, ArXiv.
[30] E. Gardner,et al. Optimal storage properties of neural network models , 1988 .
[31] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.
[32] Levent Sagun,et al. Triple descent and the two kinds of overfitting: where and why do they appear? , 2020, NeurIPS.
[33] Dongrui Wu,et al. Rethink the Connections among Generalization, Memorization, and the Spectral Bias of DNNs , 2020, IJCAI.
[34] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[35] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[36] Surya Ganguli,et al. Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics , 2021, ICLR.
[37] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[38] M. Opper. Statistical Mechanics of Learning : Generalization , 2002 .
[39] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[40] Opper,et al. Generalization ability of perceptrons with continuous outputs. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[41] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[42] Yoshiyuki Kabashima,et al. Erratum: A typical reconstruction limit of compressed sensing based on Lp-norm minimization , 2009, ArXiv.
[43] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[44] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.
[45] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[46] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[47] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[48] Arthur Jacot,et al. Implicit Regularization of Random Feature Models , 2020, ICML.
[49] Chong You,et al. Rethinking Bias-Variance Trade-off for Generalization of Neural Networks , 2020, ICML.
[50] Shun-ichi Amari,et al. When Does Preconditioning Help or Hurt Generalization? , 2021, ICLR.
[51] E. Gardner. The space of interactions in neural network models , 1988 .
[52] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Yuri Burda,et al. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , 2022, ArXiv.
[54] Amin Karbasi,et al. Multiple Descent: Design Your Own Generalization Curve , 2020, NeurIPS.
[55] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[56] S. Orszag,et al. Advanced mathematical methods for scientists and engineers I: asymptotic methods and perturbation theory. , 1999 .
[57] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[58] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[59] Levent Sagun,et al. On the interplay between data structure and loss function in classification problems , 2021, NeurIPS.
[60] M. Mézard,et al. Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .
[61] Florent Krzakala,et al. Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime , 2020, ICML.
[62] Taiji Suzuki,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.
[63] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .