Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
暂无分享,去创建一个
Alessandro Rudi | Francis Bach | Loucas Pillaud-Vivien | F. Bach | Alessandro Rudi | Loucas Pillaud-Vivien
[1] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[2] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..
[3] Volkan Cevher,et al. Optimal Rates for Spectral-regularized Algorithms with Least-Squares Regression over Hilbert Spaces , 2018, ArXiv.
[4] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.
[5] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[6] Alessandro Rudi,et al. Localized Structured Prediction , 2018, NeurIPS.
[7] Gilles Blanchard,et al. Optimal Rates for Regularization of Statistical Inverse Learning Problems , 2016, Found. Comput. Math..
[8] Massimiliano Pontil,et al. Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[11] Alessandro Rudi,et al. Exponential convergence of testing error for stochastic gradient methods , 2017, COLT.
[12] Lorenzo Rosasco,et al. Optimal Rates for Multi-pass Stochastic Gradient Methods , 2016, J. Mach. Learn. Res..
[13] Gilles Blanchard,et al. Convergence rates of Kernel Conjugate Gradient for random design regression , 2016 .
[14] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[15] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[16] Ingo Steinwart,et al. Sobolev Norm Learning Rates for Regularized Least-Squares Algorithms , 2017, J. Mach. Learn. Res..
[17] Lorenzo Rosasco,et al. Spectral Algorithms for Supervised Learning , 2008, Neural Computation.
[18] Lorenzo Rosasco,et al. Consistent Multitask Learning with Nonlinear Output Relations , 2017, NIPS.
[19] Lorenzo Rosasco,et al. A Consistent Regularization Approach for Structured Prediction , 2016, NIPS.
[20] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[21] Francis R. Bach,et al. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..
[22] G. Wahba. Spline models for observational data , 1990 .
[23] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[24] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[25] Lorenzo Rosasco,et al. Learning with SGD and Random Features , 2018, NeurIPS.
[26] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[27] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.
[28] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[29] Eric Moulines,et al. On a perturbation approach for the analysis of stochastic tracking algorithms , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[30] Lorenzo Rosasco,et al. FALKON: An Optimal Large Scale Kernel Method , 2017, NIPS.
[31] Lorenzo Rosasco,et al. On the Sample Complexity of Subspace Learning , 2013, NIPS.
[32] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.
[33] Volkan Cevher,et al. Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces , 2018, Applied and Computational Harmonic Analysis.
[34] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.
[35] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[36] Lorenzo Rosasco,et al. Generalization Properties of Learning with Random Features , 2016, NIPS.
[37] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[38] Francis R. Bach,et al. On Structured Prediction Theory with Calibrated Convex Surrogate Losses , 2017, NIPS.
[39] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.
[40] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.