A Corrective View of Neural Networks: Representation, Memorization and Learning
暂无分享,去创建一个
[1] Amit Daniely,et al. Neural Networks Learning and Memorization with (almost) no Over-Parameterization , 2019, NeurIPS.
[2] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[3] Ohad Shamir,et al. Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.
[4] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[5] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[6] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[7] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.
[8] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.
[9] Lei Wu,et al. Barron Spaces and the Compositional Function Spaces for Neural Network Models , 2019, ArXiv.
[10] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[11] Jiaoyang Huang,et al. Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[12] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[13] R. Srikant,et al. Why Deep Neural Networks for Function Approximation? , 2016, ICLR.
[14] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.
[15] Guy Bresler,et al. Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth , 2020, NeurIPS.
[16] Yin Tat Lee,et al. Network size and weights size for memorization with two-layers neural networks , 2020, ArXiv.
[17] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[18] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[19] Abhishek Panigrahi,et al. Effect of Activation Functions on the Training of Overparametrized Neural Nets , 2019, ICLR.
[20] Andrew R. Barron,et al. Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.
[21] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[22] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[23] Amit Daniely,et al. Depth Separation for Neural Networks , 2017, COLT.
[24] Bo Li,et al. Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units , 2019, Communications in Computational Physics.
[25] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[26] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[27] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[28] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[29] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.
[30] Xin Yang,et al. Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound , 2019, ArXiv.
[31] Quanquan Gu,et al. An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.
[32] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[33] Yanpeng Li,et al. Improving deep neural networks using softplus units , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[34] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[35] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[36] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[37] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[38] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[39] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[40] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[41] Matus Telgarsky,et al. Approximation power of random neural networks , 2019, ArXiv.
[42] Ambuj Tewari,et al. On the Approximation Properties of Random ReLU Features , 2018 .
[43] Matus Telgarsky,et al. Neural tangent kernels, transportation mappings, and universal approximation , 2020, ICLR.
[44] Mark Sellke,et al. Approximating Continuous Functions by ReLU Nets of Minimal Width , 2017, ArXiv.
[45] Yuan Cao,et al. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.