High-dimensional dynamics of generalization error in neural networks
暂无分享,去创建一个
[1] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[2] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[3] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[4] Raj Rao Nadakuditi,et al. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.
[5] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[6] Surya Ganguli,et al. Statistical Mechanics of Deep Learning , 2020, Annual Review of Condensed Matter Physics.
[7] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[8] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[9] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[10] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[11] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[12] Zhenyu Liao,et al. A Random Matrix Approach to Neural Networks , 2017, ArXiv.
[13] Thomas L. Griffiths,et al. Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.
[14] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[15] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[16] Pierre Baldi,et al. Temporal Evolution of Generalization during Learning in Linear Networks , 1991, Neural Computation.
[17] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[18] Robert H. Dodier,et al. Geometry of Early Stopping in Linear Networks , 1995, NIPS.
[19] Kenji Fukumizu,et al. Effect of Batch Learning in Multilayer Neural Networks , 1998, ICONIP.
[20] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[21] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[22] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[23] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[24] A. P. Dunmur,et al. Learning and generalization in a linear perceptron stochastically trained with noisy data , 1993 .
[25] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[26] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[27] Levent Sagun,et al. A jamming transition from under- to over-parametrization affects generalization in deep learning , 2018, Journal of Physics A: Mathematical and Theoretical.
[28] Shun-ichi Amari,et al. Dynamics of learning near singularities in radial basis function networks , 2008, Neural Networks.
[29] N. Pillai,et al. Universality of covariance matrices , 2011, 1110.2501.
[30] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[31] Thomas Villmann,et al. Similarity-Based Clustering, Recent Developments and Biomedical Applications [outcome of a Dagstuhl Seminar] , 2009, Similarity-Based Clustering.
[32] Michael Biehl,et al. Statistical Mechanics of On-line Learning , 2009, Similarity-Based Clustering.
[33] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.
[34] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[35] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[36] Jonathan Kadmon,et al. Optimal Architectures in a Solvable Model of Deep Networks , 2016, NIPS.
[37] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[38] M. Rattray,et al. Statistical mechanics of learning multiple orthogonal signals: asymptotic theory and fluctuation effects. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.
[39] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[40] Klaus-Robert Müller,et al. Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.
[41] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[42] Kinouchi,et al. On-line versus off-line learning in the linear perceptron: A comparative study. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[43] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[44] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[45] Klaus-Robert Müller,et al. Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective? , 1995, NIPS.
[46] T. Watkin,et al. THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .
[47] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[48] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[49] J. W. Silverstein,et al. Spectral Analysis of Large Dimensional Random Matrices , 2009 .
[50] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.
[51] J. Hertz,et al. Generalization in a linear perceptron in the presence of noise , 1992 .
[52] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[53] Surya Ganguli,et al. An equivalence between high dimensional Bayes optimal inference and M-estimation , 2016, NIPS.
[54] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[55] Saad,et al. Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.
[56] Yves Chauvin. Generalization Dynamics in LMS Trained Linear Networks , 1990, NIPS.
[57] Surya Ganguli,et al. Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .
[58] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[59] Shun-ichi Amari,et al. Dynamics of Learning Near Singularities in Layered Networks , 2008, Neural Computation.
[60] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[61] Kurt Hornik,et al. Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.
[62] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[63] Nello Cristianini,et al. Supervised and Unsupervised Learning , 2004 .
[64] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[65] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[66] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[67] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[68] Raj Rao Nadakuditi,et al. The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..
[69] Kurt Hornik,et al. Supervised and Unsupervised Learning in Linear Networks , 1990 .
[70] S. Ganguli,et al. Statistical Mechanics of High Dimensional Inference Supplementary Material , 2016 .
[71] S. Ganguli,et al. Statistical mechanics of complex neural systems and high dimensional data , 2013, 1301.7115.
[72] Taiji Suzuki,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.