暂无分享,去创建一个
[1] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[2] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[4] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[5] Yi Zhou,et al. Critical Points of Neural Networks: Analytical Forms and Landscape Properties , 2017, ArXiv.
[6] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[7] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[8] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[9] Alberto Seeger,et al. A Variational Approach to Copositive Matrices , 2010, SIAM Rev..
[10] Suvrit Sra,et al. Global optimality conditions for deep neural networks , 2017, ICLR.
[11] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[12] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[13] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[14] Yu. S. Ledyaev,et al. Nonsmooth analysis and control theory , 1998 .
[15] Ohad Shamir,et al. Are ResNets Provably Better than Linear Predictors? , 2018, NeurIPS.
[16] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[17] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[18] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[19] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[20] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[21] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[22] J. Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..
[23] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[24] Jason D. Lee,et al. No Spurious Local Minima in a Two Hidden Unit ReLU Network , 2018, ICLR.
[25] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[26] D. H. Jacobson,et al. Copositive matrices and definiteness of quadratic forms subject to homogeneous linear inequality constraints , 1981 .
[27] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[28] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[29] Katta G. Murty,et al. Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..
[30] Panos M. Pardalos,et al. Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..
[31] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.
[32] J. Borwein,et al. Convex Analysis And Nonlinear Optimization , 2000 .
[33] X H Yu,et al. On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.
[34] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[35] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[36] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[37] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[38] A. Seeger. Eigenvalue analysis of equilibrium processes defined by linear complementarity conditions , 1999 .
[39] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[40] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.