论文信息 - Learning a Single Neuron with Gradient Methods

Learning a Single Neuron with Gradient Methods

We consider the fundamental problem of learning a single neuron $x \mapsto\sigma(w^\top x)$ using standard gradient methods. As opposed to previous works, which considered specific (and not always realistic) input distributions and activation functions $\sigma(\cdot)$, we ask whether a more general result is attainable, under milder assumptions. On the one hand, we show that some assumptions on the distribution and the activation function are necessary. On the other hand, we prove positive guarantees under mild assumptions, which go beyond those studied in the literature so far. We also point out and study the challenges in further strengthening and generalizing our results.

O. Shamir | Gilad Yehudai

[1] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.

[2] Yan Shuo Tan,et al. Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval , 2019, ArXiv.

[3] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.

[4] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.

[5] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[6] Amir Salman Avestimehr,et al. Fitting ReLUs via SGD and Quantized SGD , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[7] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.

[8] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[9] Samet Oymak,et al. Stochastic Gradient Descent Learns State Equations with Nonlinear Activations , 2018, COLT.

[10] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[11] Karthik Sridharan,et al. Uniform Convergence of Gradients for Non-Convex Learning and Optimization , 2018, NeurIPS.