暂无分享,去创建一个
Xiao Zhang | Lingxiao Wang | Yaodong Yu | Quanquan Gu | Xiao Zhang | Lingxiao Wang | Quanquan Gu | Yaodong Yu
[1] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..
[2] Sanjeev Arora,et al. Provable learning of noisy-OR networks , 2016, STOC.
[3] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[4] M. Irani. Vision Day Schedule Time Speaker and Collaborators Affiliation Title a General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms on the Expressive Power of Deep Learning: a Tensor Analysis , 2016 .
[5] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[6] Amitabh Basu,et al. Lower bounds over Boolean inputs for deep neural networks with ReLU gates , 2017, Electron. Colloquium Comput. Complex..
[7] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[8] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[9] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[10] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.
[11] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[12] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[13] Anima Anandkumar,et al. Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.
[14] Martin J. Wainwright,et al. On the Learnability of Fully-Connected Neural Networks , 2017, AISTATS.
[15] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[16] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[17] Ohad Shamir,et al. Failures of Gradient-Based Deep Learning , 2017, ICML.
[18] Dmitry Yarotsky,et al. Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.
[19] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[20] Adam Tauman Kalai,et al. Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.
[21] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[22] Yingbin Liang,et al. Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression , 2018, ArXiv.
[23] Varun Kanade,et al. Reliably Learning the ReLU in Polynomial Time , 2016, COLT.
[24] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[25] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[26] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[27] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[28] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[29] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[30] Martin J. Wainwright,et al. Learning Halfspaces and Neural Networks with Random Initialization , 2015, ArXiv.
[31] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[32] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[33] Stuart Donnan,et al. In this number , 1994 .
[34] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[35] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[36] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[37] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[38] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[39] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[40] Mark Sellke,et al. Approximating Continuous Functions by ReLU Nets of Minimal Width , 2017, ArXiv.
[41] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[42] Ohad Shamir,et al. Weight Sharing is Crucial to Succesful Optimization , 2017, ArXiv.
[43] Boris Hanin,et al. Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations , 2017, Mathematics.
[44] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[46] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[47] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[48] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[49] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[50] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[51] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[52] Adam Tauman Kalai,et al. The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.
[53] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[54] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[55] Suvrit Sra,et al. Global optimality conditions for deep neural networks , 2017, ICLR.
[56] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[57] Vivek Srikumar,et al. Expressiveness of Rectifier Networks , 2015, ICML.
[58] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[59] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[60] Zhize Li,et al. Learning Two-layer Neural Networks with Symmetric Inputs , 2018, ICLR.
[61] ImageNet Classification with Deep Convolutional Neural , 2013 .
[62] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[63] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[64] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[65] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[66] Pramod Viswanath,et al. Learning One-hidden-layer Neural Networks under General Input Distributions , 2018, AISTATS.
[67] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[68] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.
[69] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.