Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
暂无分享,去创建一个
[1] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[2] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[3] Shang-Hua Teng,et al. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.
[4] Rocco A. Servedio,et al. Learning DNF in time 2Õ(n1/3) , 2004, J. Comput. Syst. Sci..
[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[6] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Karthik Sridharan. Machine Learning Theory (CS 6783) , 2014 .
[8] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[9] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[10] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[11] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[12] Percy Liang. CS229T/STAT231: Statistical Learning Theory (Winter 2016) , 2015 .
[13] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[14] Yuchen Zhang,et al. L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.
[15] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[16] Andreas Maurer,et al. A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.
[17] Le Song,et al. Diversity Leads to Generalization in Neural Networks , 2016, ArXiv.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yuanzhi Li,et al. Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates , 2016, NIPS.
[20] Amit Daniely,et al. Complexity theoretic limitations on learning halfspaces , 2015, STOC.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[23] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[24] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[25] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[26] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[27] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[28] Martin J. Wainwright,et al. On the Learnability of Fully-Connected Neural Networks , 2017, AISTATS.
[29] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[30] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[31] Guanghui Lan,et al. Theoretical properties of the global optimizer of two layer neural network , 2017, ArXiv.
[32] Ilias Diakonikolas,et al. Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.
[33] Yuanzhi Li,et al. Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations , 2017, ICML.
[34] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[35] Varun Kanade,et al. Reliably Learning the ReLU in Polynomial Time , 2016, COLT.
[36] Yuanzhi Li,et al. Algorithmic Regularization in Over-parameterized Matrix Recovery , 2017, ArXiv.
[37] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[38] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..
[39] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[40] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[41] Alex Zhai,et al. The CLT in high dimensions: Quantitative bounds via martingale embedding , 2018, The Annals of Probability.
[42] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[43] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[44] Santosh S. Vempala,et al. Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks , 2018, ArXiv.
[45] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[46] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[47] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[48] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[49] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[50] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[51] David P. Woodruff,et al. Learning Two Layer Rectified Neural Networks in Polynomial Time , 2018, COLT.
[52] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[53] Zhize Li,et al. Learning Two-layer Neural Networks with Symmetric Inputs , 2018, ICLR.
[54] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[55] Adam R. Klivans,et al. Learning Neural Networks with Two Nonlinear Layers in Polynomial Time , 2017, COLT.
[56] M. Wainwright. Basic tail and concentration bounds , 2019, High-Dimensional Statistics.
[57] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[58] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[59] Yuanzhi Li,et al. Can SGD Learn Recurrent Neural Networks with Provable Generalization? , 2019, NeurIPS.
[60] Yuanzhi Li,et al. On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.
[61] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[62] Yuanzhi Li,et al. What Can ResNet Learn Efficiently, Going Beyond Kernels? , 2019, NeurIPS.
[63] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.