Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization
暂无分享,去创建一个
[1] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[2] Guozhong An,et al. The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.
[3] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[4] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[5] Petri Koistinen,et al. Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.
[6] Leslie Pack Kaelbling,et al. Elimination of All Bad Local Minima in Deep Learning , 2019, AISTATS.
[7] Dawei Li,et al. On the Benefit of Width for Neural Networks: Disappearance of Basins , 2018, SIAM J. Optim..
[8] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[9] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[10] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[11] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[12] Thomas Laurent,et al. Deep linear neural networks with arbitrary loss: All local minima are global , 2017, ArXiv.
[13] Yi Zhou,et al. Critical Points of Linear Neural Networks: Analytical Forms and Landscape Properties , 2017, ICLR.
[14] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[15] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[16] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[17] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[20] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[21] Suvrit Sra,et al. Efficiently testing local optimality and escaping saddles for ReLU networks , 2018, ICLR.
[22] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[23] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[24] Samet Oymak,et al. Stochastic Gradient Descent Learns State Equations with Nonlinear Activations , 2018, COLT.
[25] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[26] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[27] Chinmay Hegde,et al. Learning ReLU Networks via Alternating Minimization , 2018, ArXiv.
[28] Alberto Tesi,et al. On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..
[29] Junwei Lu,et al. On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond , 2018, ArXiv.
[30] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[31] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[32] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[33] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[34] Albert B Novikoff,et al. ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .
[35] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[36] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[37] Yonina C. Eldar,et al. Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow , 2016, IEEE Transactions on Information Theory.
[38] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[39] Songtao Lu,et al. On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions , 2018, ArXiv.
[40] Dawei Li,et al. Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations , 2018, ArXiv.
[41] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[42] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[43] F. Clarke. Optimization And Nonsmooth Analysis , 1983 .
[44] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[45] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[46] Chuan Wang,et al. Training neural networks with additive noise in the desired signal , 1999, IEEE Trans. Neural Networks.
[47] Yi Zhou,et al. Convergence of SGD in Learning ReLU Models with Separable Data , 2018, ArXiv.
[48] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[49] Yingbin Liang,et al. Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy , 2018, IEEE Transactions on Signal Processing.
[50] Yingbin Liang,et al. Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression , 2018, ArXiv.
[51] Amir Salman Avestimehr,et al. Fitting ReLUs via SGD and Quantized SGD , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[52] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[53] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[54] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[55] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[56] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[57] Gang Wang,et al. Real-Time Power System State Estimation and Forecasting via Deep Unrolled Neural Networks , 2018, IEEE Transactions on Signal Processing.
[58] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[59] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[60] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .
[61] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[62] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[63] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[64] Suvrit Sra,et al. A Critical View of Global Optimality in Deep Learning , 2018, ArXiv.
[65] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[66] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.