Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification
暂无分享,去创建一个
Prateek Jain | Sham M. Kakade | Praneeth Netrapalli | Aaron Sidford | Rahul Kidambi | S. Kakade | Prateek Jain | Praneeth Netrapalli | Aaron Sidford | Rahul Kidambi
[1] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[2] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[3] D. Anbar. On Optimal Estimation Methods Using Stochastic Approximation Procedures , 1973 .
[4] V. Fabian. Asymptotically Efficient Stochastic Approximation; The RM Case , 1973 .
[5] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[6] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[7] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[8] H. Kushner,et al. Asymptotic properties of distributed and communication stochastic approximation algorithms , 1987 .
[9] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[10] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[11] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[12] H. Robbins. A Stochastic Approximation Method , 1951 .
[13] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[14] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[15] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[16] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.
[17] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[18] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[19] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[20] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.
[21] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[22] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.
[23] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[24] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[25] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[26] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.
[27] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.
[28] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, NIPS.
[29] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[30] Jonathan D. Rosenblatt,et al. On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.
[31] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[32] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[33] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..
[34] Martin J. Wainwright,et al. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..
[35] Francis R. Bach,et al. Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2015, AISTATS.
[36] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[37] Peter Richtárik,et al. Distributed Mini-Batch SDCA , 2015, ArXiv.
[38] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.
[39] John C. Duchi,et al. Asynchronous stochastic convex optimization , 2015, 1508.00882.
[40] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[41] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.
[42] Prateek Jain,et al. Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.
[43] Aaron Defazio,et al. A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.
[44] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[45] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[46] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[47] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, ArXiv.
[48] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.
[49] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..
[50] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[51] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[52] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[53] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..