Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
暂无分享,去创建一个
[1] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[2] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[3] R. Srikant,et al. Why Deep Neural Networks for Function Approximation? , 2016, ICLR.
[4] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[5] Suvrit Sra,et al. Global optimality conditions for deep neural networks , 2017, ICLR.
[6] Eduardo D. Sontag,et al. Shattering All Sets of k Points in General Position Requires (k 1)/2 Parameters , 1997, Neural Computation.
[7] Yi Zhou,et al. SGD Converges to Global Minimum in Deep Learning via Star-convex Path , 2019, ICLR.
[8] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[9] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[10] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[11] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.
[12] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[13] Guang-Bin Huang,et al. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions , 1998, IEEE Trans. Neural Networks.
[14] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[15] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[16] Yih-Fang Huang,et al. Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.
[17] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[18] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[19] G. Lewicki,et al. Approximation by Superpositions of a Sigmoidal Function , 2003 .
[20] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.
[21] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[22] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[23] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[24] Adam Kowalczyk,et al. Estimates of Storage Capacity of Multilayer Perceptron with Threshold Logic Hidden Units , 1997, Neural Networks.
[25] Tengyuan Liang,et al. On the Risk of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, ArXiv.
[26] Matthias Hein,et al. Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions , 2018, ICML.
[27] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[28] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[29] Guang-Bin Huang,et al. Learning capability and storage capacity of two-hidden-layer feedforward networks , 2003, IEEE Trans. Neural Networks.
[30] Tengyuan Liang,et al. On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, COLT.
[31] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[32] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[33] Matus Telgarsky,et al. Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.
[34] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[35] Ohad Shamir,et al. Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.
[36] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[37] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[38] Suvrit Sra,et al. Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.
[39] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[40] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.
[41] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.
[42] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[43] Yi Zhou,et al. Critical Points of Neural Networks: Analytical Forms and Landscape Properties , 2017, ArXiv.
[44] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[45] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[46] Masami Yamasaki,et al. The Lower Bound of the Capacity for a Neural Network with Multiple Hidden Layers , 1993 .
[47] Dmitry Yarotsky,et al. Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.
[48] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[49] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[50] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[51] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..
[52] Dong Wang,et al. Learning machines: Rationale and application in ground-level ozone prediction , 2014, Appl. Soft Comput..
[53] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[54] David Rolnick,et al. The power of deeper networks for expressing natural functions , 2017, ICLR.
[55] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[56] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[57] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[58] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[59] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[60] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[61] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.
[62] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..