暂无分享,去创建一个
[1] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[2] Vladimir Braverman,et al. The Benefits of Implicit Regularization from SGD in Least Squares Problems , 2021, ArXiv.
[3] A. Tsigler,et al. Benign overfitting in ridge regression , 2020 .
[4] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[5] Vladimir Braverman,et al. Benign Overfitting of Constant-Stepsize SGD for Linear Regression , 2021, COLT.
[6] Asuman E. Ozdaglar,et al. A Universally Optimal Multistage Accelerated Stochastic Gradient Method , 2019, NeurIPS.
[7] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.
[8] Julien Mairal,et al. A Generic Acceleration Framework for Stochastic Composite Optimization , 2019, NeurIPS.
[9] Sham M. Kakade,et al. The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.
[10] Sham M. Kakade,et al. A risk comparison of ordinary least squares vs ridge regression , 2011, J. Mach. Learn. Res..
[11] Dmitriy Drusvyatskiy,et al. Stochastic algorithms with geometric step decay converge linearly on sharp functions , 2019, Mathematical Programming.
[12] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[13] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[14] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[15] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[16] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[17] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[18] Nicolas Flammarion,et al. Last iterate convergence of SGD for Least-Squares in the Interpolation regime , 2021, ArXiv.
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[21] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[22] Samy Bengio,et al. Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.
[23] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.
[24] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
[25] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[26] Sham M. Kakade,et al. Random Design Analysis of Ridge Regression , 2012, COLT.