Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
暂无分享,去创建一个
[1] I. Pinelis,et al. Remarks on Inequalities for Large Deviation Probabilities , 1986 .
[2] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[3] J. Fujii,et al. Norm inequalities equivalent to Heinz inequality , 1993 .
[4] Bernhard Schölkopf,et al. Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.
[5] Christopher K. I. Williams,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.
[6] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[7] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.
[8] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[9] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[10] Bin Yu,et al. Boosting with early stopping: Convergence and consistency , 2005, math/0508276.
[11] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.
[12] Emmanuel J. Candès,et al. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.
[13] Yiming Ying,et al. Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..
[14] Yuan Yao,et al. Online Learning Algorithms , 2006, Found. Comput. Math..
[15] Lorenzo Rosasco,et al. On regularization algorithms in learning theory , 2007, J. Complex..
[16] H. Robbins. A Stochastic Approximation Method , 1951 .
[17] Ding-Xuan Zhou,et al. Learning Theory: An Approximation Theory Viewpoint , 2007 .
[18] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..
[19] S. Smale,et al. Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .
[20] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[21] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[22] Massimiliano Pontil,et al. Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..
[23] Lorenzo Rosasco,et al. Spectral Algorithms for Supervised Learning , 2008, Neural Computation.
[24] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.
[25] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[26] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[27] Gilles Blanchard,et al. Optimal learning rates for Kernel Conjugate Gradient regression , 2010, NIPS.
[28] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[29] Y. Yao,et al. Cross-validation based adaptation for regularization operators in learning theory , 2010 .
[30] Stanislav Minsker. On Some Extensions of Bernstein's Inequality for Self-adjoint Operators , 2011, 1112.5448.
[31] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[32] Rong Jin,et al. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.
[33] J. Tropp. User-Friendly Tools for Random Matrices: An Introduction , 2012 .
[34] Sham M. Kakade,et al. Random Design Analysis of Ridge Regression , 2012, COLT.
[35] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[36] Yuan Yao,et al. Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence , 2011, IEEE Transactions on Information Theory.
[37] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[38] Martin J. Wainwright,et al. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..
[39] Michael W. Mahoney,et al. Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.
[40] Martin J. Wainwright,et al. Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.
[41] Lorenzo Rosasco,et al. Less is More: Nyström Computational Regularization , 2015, NIPS.
[42] Ding-Xuan Zhou,et al. Learning theory of randomized Kaczmarz algorithm , 2015, J. Mach. Learn. Res..
[43] Steven C. H. Hoi,et al. Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..
[44] Ingo Steinwart,et al. Optimal Learning Rates for Localized SVMs , 2015, J. Mach. Learn. Res..
[45] G. Blanchard,et al. Parallelizing Spectral Algorithms for Kernel Learning , 2016, 1610.07487.
[46] Pradeep Ravikumar,et al. Kernel Ridge Regression via Partitioning , 2016, ArXiv.
[47] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[48] Lea Fleischer,et al. Regularization of Inverse Problems , 1996 .
[49] Lorenzo Rosasco,et al. Optimal Rates for Multi-pass Stochastic Gradient Methods , 2016, J. Mach. Learn. Res..
[50] Qiang Liu,et al. Communication-efficient Sparse Regression , 2017, J. Mach. Learn. Res..
[51] Ben London,et al. A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent , 2017, NIPS.
[52] Nicole Mucke. Reducing training time by efficient localized kernel regression , 2017, 1707.03220.
[53] Ding-Xuan Zhou,et al. Learning theory of distributed spectral algorithms , 2017 .
[54] Lorenzo Rosasco,et al. FALKON: An Optimal Large Scale Kernel Method , 2017, NIPS.
[55] Ingo Steinwart,et al. Spatial Decompositions for Large Scale SVMs , 2017, AISTATS.
[56] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
[57] Daniel J. Hsu,et al. Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators , 2017 .
[58] Volkan Cevher,et al. Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage , 2017, AISTATS.
[59] Lorenzo Rosasco,et al. Optimal Rates for Learning with Nyström Stochastic Gradient Methods , 2017, ArXiv.
[60] Ding-Xuan Zhou,et al. Distributed Learning with Regularized Least Squares , 2016, J. Mach. Learn. Res..
[61] Volkan Cevher,et al. Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods , 2018, ICML.
[62] Gilles Blanchard,et al. Parallelizing Spectrally Regularized Kernel Algorithms , 2018, J. Mach. Learn. Res..
[63] Gilles Blanchard,et al. Optimal Rates for Regularization of Statistical Inverse Learning Problems , 2016, Found. Comput. Math..
[64] Volkan Cevher,et al. Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms , 2018, ArXiv.
[65] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[66] Alessandro Rudi,et al. Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes , 2018, NeurIPS.
[67] Lorenzo Rosasco,et al. Learning with SGD and Random Features , 2018, NeurIPS.
[68] Lorenzo Rosasco,et al. Beating SGD Saturation with Tail-Averaging and Minibatching , 2019, NeurIPS.
[69] Dominic Richards,et al. Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up , 2019, NeurIPS.
[70] Francesco Orabona,et al. Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration , 2019, NeurIPS.
[71] Lorenzo Rosasco,et al. Implicit Regularization of Accelerated Methods in Hilbert Spaces , 2019, NeurIPS.
[72] Ingo Steinwart,et al. Sobolev Norm Learning Rates for Regularized Least-Squares Algorithms , 2017, J. Mach. Learn. Res..