When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?
暂无分享,去创建一个
[1] Yoshua Bengio,et al. On the Learning Dynamics of Deep Neural Networks , 2018, ArXiv.
[2] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[3] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[4] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.
[5] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[6] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[7] J. Borwein,et al. Convex Analysis And Nonlinear Optimization , 2000 .
[8] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[9] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[10] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[11] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[12] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[13] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[14] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[15] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[16] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[17] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[18] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..
[19] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.
[20] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..