Without-Replacement Sampling for Stochastic Gradient Methods
暂无分享,去创建一个
[1] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[2] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .
[3] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[4] Ran El-Yaniv,et al. Transductive Rademacher Complexity and Its Applications , 2007, COLT.
[5] Vincent Nesme,et al. Note on sampling without replacing from a finite collection of matrices , 2010, ArXiv.
[6] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[7] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[8] Maria-Florina Balcan,et al. Distributed Learning, Communication Complexity and Privacy , 2012, COLT.
[9] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[10] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[11] B. Recht,et al. Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences , 2012, 1202.4184.
[12] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[13] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[14] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[15] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[16] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[17] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[18] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..
[19] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[20] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[21] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[22] Yuchen Zhang,et al. Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss , 2015, ArXiv.
[23] Tengyu Ma,et al. Distributed Stochastic Variance Reduced Gradient Methods , 2015, ArXiv.
[24] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[25] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.