Double-descent curves in neural networks: a new perspective using Gaussian processes
暂无分享,去创建一个
[1] James B. Simon,et al. Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting , 2022, ArXiv.
[2] Yue M. Lu,et al. An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings , 2022, 2205.06308.
[3] Hayden Schaeffer,et al. Conditioning of Random Feature Matrices: Double Descent and Generalization Error , 2021, ArXiv.
[4] J. Suykens,et al. On the Double Descent of Random Features Models Trained with SGD , 2021, NeurIPS.
[5] James B. Simon,et al. The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks , 2021, 2110.03922.
[6] Zhi-Hua Zhou,et al. Towards an Understanding of Benign Overfitting in Neural Networks , 2021, ArXiv.
[7] F. Krzakala,et al. Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime , 2021, NeurIPS.
[8] Mikhail Belkin,et al. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , 2021, Acta Numerica.
[9] O. Zeitouni,et al. Lower Bounds on the Generalization Error of Nonlinear Learning Models , 2021, IEEE Transactions on Information Theory.
[10] Masaaki Imaizumi,et al. Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks , 2021, ArXiv.
[11] Matthieu Wyart,et al. Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training , 2020, ArXiv.
[12] Haim Sompolinsky,et al. Statistical Mechanics of Deep Linear Neural Networks: The Backpropagating Kernel Renormalization , 2020, Physical Review X.
[13] Ard A. Louis,et al. Generalization bounds for deep learning , 2020, ArXiv.
[14] Jeffrey Pennington,et al. Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition , 2020, NeurIPS.
[15] Zhenyu Liao,et al. Kernel regression in high dimension: Refined analysis beyond double descent , 2020, AISTATS.
[16] Jaehoon Lee,et al. Finite Versus Infinite Neural Networks: an Empirical Study , 2020, NeurIPS.
[17] Jeffrey Pennington,et al. The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization , 2020, ICML.
[18] Guillermo Valle Pérez,et al. Is SGD a Bayesian sampler? Well, almost , 2020, J. Mach. Learn. Res..
[19] Andrea Montanari,et al. When do neural networks outperform kernel methods? , 2020, NeurIPS.
[20] C. Pehlevan,et al. Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks , 2020, Nature Communications.
[21] Arthur Jacot,et al. Kernel Alignment Risk Estimator: Risk Prediction from Training Data , 2020, NeurIPS.
[22] Zhenyu Liao,et al. A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent , 2020, NeurIPS.
[23] Z. Fan,et al. Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks , 2020, NeurIPS.
[24] Sundeep Rangan,et al. Generalization Error of Generalized Linear Models in High Dimensions , 2020, ICML.
[25] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[26] Florent Krzakala,et al. Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.
[27] Pavel Izmailov,et al. Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.
[28] Florent Krzakala,et al. Asymptotic errors for convex penalized linear regression beyond Gaussian matrices. , 2020, 2002.04372.
[29] Blake Bordelon,et al. Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks , 2020, ICML.
[30] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[31] Greg Yang,et al. Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes , 2019, NeurIPS.
[32] Yuan Cao,et al. Towards Understanding the Spectral Bias of Deep Learning , 2019, IJCAI.
[33] Guillermo Valle Pérez,et al. Neural networks are a priori biased towards Boolean functions with low entropy , 2019, ArXiv.
[34] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[35] Nicolas Boumal,et al. Efficiently escaping saddle points on manifolds , 2019, NeurIPS.
[36] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[37] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[38] Philip M. Long,et al. On the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep Network , 2019, Neural Computation.
[39] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[40] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[41] Jaehoon Lee,et al. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.
[42] Laurence Aitchison,et al. Deep Convolutional Networks as shallow Gaussian Processes , 2018, ICLR.
[43] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.
[44] Chico Q. Camargo,et al. Input–output maps are strongly biased towards simple outputs , 2018, Nature Communications.
[45] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.
[46] Jeffrey Pennington,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[47] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[48] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[49] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[50] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[51] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[52] A. Montanari,et al. The spectral norm of random inner-product kernel matrices , 2015, 1507.05343.
[53] M. Peligrad,et al. On the limiting spectral distribution for a large class of symmetric random matrices with correlated entries , 2015 .
[54] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[55] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[56] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[57] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[58] Ard A. Louis,et al. The Arrival of the Frequent: How Bias in Genotype-Phenotype Maps Can Steer Populations to Local Optima , 2014, PloS one.
[59] Michael M. Bronstein,et al. Almost-commuting matrices are almost jointly diagonalizable , 2013, ArXiv.
[60] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[61] Xiuyuan Cheng,et al. THE SPECTRUM OF RANDOM INNER-PRODUCT KERNEL MATRICES , 2012, 1202.3155.
[62] Noureddine El Karoui,et al. The spectrum of kernel random matrices , 2010, 1001.0492.
[63] Jihnhee Yu,et al. Measures, Integrals and Martingales , 2007, Technometrics.
[64] Yuan Yao,et al. Mercer's Theorem, Feature Maps, and Smoothing , 2006, COLT.
[65] Theodore P. Hill,et al. Necessary and sufficient condition that the limit of Stieltjes transforms is a Stieltjes transform , 2003, J. Approx. Theory.
[66] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[67] Peter Sollich,et al. Learning Curves for Gaussian Process Regression: Approximations and Bounds , 2001, Neural Computation.
[68] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[69] C. Tracy,et al. Introduction to Random Matrices , 1992, hep-th/9210073.
[70] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[71] F. Vallet,et al. Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions , 1989 .
[72] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[73] E. Wigner. Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .
[74] J. Mercer. Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .
[75] Ayça Özçelikkale,et al. Double Descent in Random Feature Models: Precise Asymptotic Analysis for General Convex Regularization , 2022, ArXiv.
[76] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[77] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[78] G. Micula,et al. Numerical Treatment of the Integral Equations , 1999 .
[79] Radford M. Neal. Bayesian learning for neural networks , 1995 .
[80] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .