暂无分享,去创建一个
Vladimir Braverman | Sham M. Kakade | Quanquan Gu | Difan Zou | Jingfeng Wu | S. Kakade | Quanquan Gu | V. Braverman | Difan Zou | Jingfeng Wu
[1] Francis Bach,et al. Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model , 2020, NeurIPS.
[2] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[3] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[4] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[5] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[6] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[7] Qiang Liu,et al. Dimension Independent Generalization Error with Regularized Online Optimization , 2020, ArXiv.
[8] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[9] Sham M. Kakade,et al. Random Design Analysis of Ridge Regression , 2012, COLT.
[10] A. Tsigler,et al. Benign overfitting in ridge regression , 2020 .
[11] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[12] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[13] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, ArXiv.
[14] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[15] Francis R. Bach,et al. Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2015, AISTATS.
[16] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[17] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[18] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.
[19] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[20] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[21] Tengyu Ma,et al. Optimal Regularization Can Mitigate Double Descent , 2021, ICLR.
[22] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[23] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..