A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
暂无分享,去创建一个
Prateek Jain | Sham M. Kakade | Venkata Krishna Pillutla | Praneeth Netrapalli | Aaron Sidford | Rahul Kidambi
[1] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[2] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[3] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[4] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[5] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[6] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[7] Francis R. Bach,et al. Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2015, AISTATS.
[8] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .
[9] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..