暂无分享,去创建一个
[1] Michael Biehl,et al. Learning by on-line gradient descent , 1995 .
[2] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[3] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[4] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[5] Joan Bruna,et al. Symmetry Breaking in Symmetric Tensor Decomposition , 2021, ArXiv.
[6] Yossi Arjevani,et al. Equivariant bifurcation, quadratic equivariants, and symmetry breaking for the standard representation of S k , 2021, ArXiv.
[7] Michiel Straat,et al. Hidden Unit Specialization in Layered Neural Networks: ReLU vs. Sigmoidal Activation , 2019, Physica A: Statistical Mechanics and its Applications.
[8] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[9] Vardan Papyan,et al. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size. , 2018 .
[10] Michael Field,et al. Symmetry breaking and the maximal isotropy subgroup conjecture for reflection groups , 1989 .
[11] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[12] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[13] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[14] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[15] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[16] Yossi Arjevani,et al. Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry , 2020, NeurIPS.
[17] Fredrik Meyer,et al. Representation theory , 2015 .
[18] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[19] Yossi Arjevani,et al. Symmetry & critical points for a model shallow neural network , 2020, ArXiv.
[20] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[21] T. Watkin,et al. THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .
[22] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[23] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[24] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[25] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[26] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[27] Saad,et al. Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.
[28] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[29] Michael Field,et al. Dynamics and Symmetry , 2007 .
[30] Zhanxing Zhu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[31] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..
[32] Taiji Suzuki,et al. On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting , 2021, ICML.
[33] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[34] David Tse,et al. Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.
[35] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[36] Eric Vanden-Eijnden,et al. Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions , 2020, NeurIPS.
[37] S. Kak. Information, physics, and computation , 1996 .
[38] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[39] Michael Biehl,et al. On-line backpropagation in two-layered neural networks , 1995 .
[40] Florent Krzakala,et al. Generalisation dynamics of online learning in over-parameterised neural networks , 2019, ArXiv.
[41] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[42] Nicolas Macris,et al. The committee machine: computational to statistical gaps in learning a two-layers neural network , 2018, NeurIPS.
[43] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[44] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[45] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[46] E. Gardner,et al. Three unfinished works on the optimal storage capacity of networks , 1989 .
[47] Wolfgang Kinzel,et al. Improving a Network Generalization Ability by Selecting Examples , 1990 .
[48] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[49] Shun-ichi Amari,et al. Universal statistics of Fisher information in deep neural networks: mean field approach , 2018, AISTATS.
[50] M. Mézard,et al. Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .
[51] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[52] Zhenyu Liao,et al. A Random Matrix Approach to Neural Networks , 2017, ArXiv.
[53] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[54] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[55] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[56] Allan Sly,et al. Proof of the Satisfiability Conjecture for Large k , 2014, STOC.
[57] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[58] M. Golubitsky. The Bénard Problem, Symmetry and the Lattice of Isotropy Subgroups , 1983 .
[59] Rina Panigrahy,et al. Electron-Proton Dynamics in Deep Learning , 2017, ArXiv.
[60] Andrea Montanari,et al. When do neural networks outperform kernel methods? , 2020, NeurIPS.
[61] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[62] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[63] C. Thomas. Representations Of Finite And Lie Groups , 2004 .
[64] Florent Krzakala,et al. Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup , 2019, NeurIPS.