暂无分享,去创建一个
R. Srikant | Shiyu Liang | Ruoyu Sun | R. Srikant | Shiyu Liang | Ruoyu Sun
[1] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[2] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[3] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[4] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[5] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[6] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.
[7] R. Srikant,et al. Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.
[8] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.
[9] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[10] Rina Panigrahy,et al. Electron-Proton Dynamics in Deep Learning , 2017, ArXiv.
[11] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[12] Ruoyu Sun,et al. Spurious Local Minima Exist for Almost All Over-parameterized Neural Networks , 2019 .
[13] Andrea Montanari,et al. On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition , 2018, AISTATS.
[14] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.
[17] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[18] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[19] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[20] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[21] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[23] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[24] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[25] Li Shen,et al. On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks , 2018, ArXiv.
[26] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.
[27] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[28] Yuxin Chen,et al. Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.
[29] Li Shen,et al. Weighted AdaGrad with Unified Momentum , 2018 .
[30] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[31] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[32] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .
[33] Zhi-Quan Luo,et al. Interference Alignment Using Finite and Dependent Channel Extensions: The Single Beam Case , 2013, IEEE Transactions on Information Theory.
[34] David Tse,et al. Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.
[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[36] Konstantinos Spiliopoulos,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[37] Rina Panigrahy,et al. Convergence Results for Neural Networks via Electrodynamics , 2017, ITCS.
[38] Justin A. Sirignano,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[39] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[40] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[41] Dawei Li,et al. Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations , 2018, ArXiv.
[42] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[43] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[44] Dawei Li,et al. On the Benefit of Width for Neural Networks: Disappearance of Basins , 2018, SIAM J. Optim..
[45] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[46] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[47] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[48] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[49] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.
[50] Matthias Hein,et al. On the loss landscape of a class of deep neural networks with no bad local valleys , 2018, ICLR.
[51] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[52] Dawei Li,et al. Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks , 2019, ArXiv.
[53] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[54] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[55] Leslie Pack Kaelbling,et al. Elimination of All Bad Local Minima in Deep Learning , 2019, AISTATS.
[56] Pramod Viswanath,et al. Learning One-hidden-layer Neural Networks under General Input Distributions , 2018, AISTATS.
[57] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[58] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[59] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[61] Andrea Montanari,et al. Matrix completion from a few entries , 2009, ISIT.
[62] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[63] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[64] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[65] Stanisław Saks,et al. Theory of the Integral , 2011 .
[66] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[67] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.