On the Power and Limitations of Random Features for Understanding Neural Networks
暂无分享,去创建一个
[1] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[2] Andrea Montanari,et al. Linearized two-layers neural networks in high dimension , 2019, The Annals of Statistics.
[3] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[4] Ambuj Tewari,et al. On the Approximation Properties of Random ReLU Features , 2018 .
[5] Yuchen Zhang,et al. L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.
[6] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[7] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[8] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[9] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[10] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[11] B. Gelbaum,et al. Bases of tensor products of Banach spaces , 1961 .
[12] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[13] Yuanzhi Li,et al. Can SGD Learn Recurrent Neural Networks with Provable Generalization? , 2019, NeurIPS.
[14] Amir Globerson,et al. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem , 2018, ICML.
[15] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[16] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[17] Amir Globerson,et al. Over-parameterization Improves Generalization in the XOR Detection Problem , 2018, ArXiv.
[18] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[19] Ambuj Tewari,et al. Random ReLU Features: Universality, Approximation, and Composition , 2018, ArXiv.
[20] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[21] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[22] Andrew R. Barron,et al. Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.
[23] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[24] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[25] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[26] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[27] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..
[28] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[29] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[30] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[31] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[32] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[33] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[34] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[35] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[36] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[37] Yuanzhi Li,et al. Algorithmic Regularization in Over-parameterized Matrix Recovery , 2017, ArXiv.
[38] M. Ledoux. The concentration of measure phenomenon , 2001 .
[39] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[40] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[41] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[42] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[43] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[44] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[45] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.