Gradient Dynamics of Shallow Univariate ReLU Networks
暂无分享,去创建一个
Joan Bruna | Daniele Panozzo | Claudio Silva | Denis Zorin | Matthew Trager | Cláudio T. Silva | Francis Williams | Joan Bruna | D. Zorin | Daniele Panozzo | Matthew Trager | Francis Williams
[1] Sylvain Gelly,et al. Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.
[2] Joan Bruna,et al. Deep Geometric Prior for Surface Reconstruction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[4] Julien Mairal,et al. On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.
[5] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[6] T. Hotz,et al. Representation by Integrating Reproducing Kernels , 2012, 1202.4443.
[7] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.
[8] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[9] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[10] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[11] Ronen Basri,et al. Efficient Representation of Low-Dimensional Manifolds using Deep Networks , 2016, ICLR.
[12] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[13] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[14] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[15] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[16] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.
[17] Joan Bruna,et al. Global convergence of neuron birth-death dynamics , 2019, ICML 2019.
[18] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[19] Joan Bruna,et al. Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys , 2018, ArXiv.
[20] Andrea Montanari,et al. Linearized two-layers neural networks in high dimension , 2019, The Annals of Statistics.
[21] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[22] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[23] G. Petrova,et al. Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document} , 2019, Constructive Approximation.
[24] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .
[25] David Rolnick,et al. Complexity of Linear Regions in Deep Networks , 2019, ICML.
[26] Joan Bruna,et al. Spurious Valleys in Two-layer Neural Network Optimization Landscapes , 2018, 1802.06384.
[27] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[28] F. Clarke. Generalized gradients and applications , 1975 .
[29] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[30] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[31] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[32] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[33] Justin A. Sirignano,et al. Mean field analysis of neural networks: A central limit theorem , 2018, Stochastic Processes and their Applications.
[34] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.