论文信息 - Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

We present and analyze an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors). Our approach allows a communication-memory tradeoff, with either logarithmic communication but linear memory, or polynomial communication and a corresponding polynomial reduction in required memory. This communication-memory tradeoff is achieved through minibatch-prox iterations (minibatch passive-aggressive updates), where a subproblem on a minibatch is solved at each iteration. We provide a novel analysis for such a minibatch-prox procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, thus significantly improving on prior work.

[1] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[2] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[4] Dale Schuurmans,et al. implicit Online Learning with Kernels , 2006, NIPS.

[5] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.

[6] Peter L. Bartlett,et al. Implicit Online Learning , 2010, ICML.

[7] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[8] Dimitri P. Bertsekas,et al. Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[9] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[10] Mark W. Schmidt,et al. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[11] Deanna Needell,et al. Paved with Good Intentions: Analysis of a Randomized Block Kaczmarz Method , 2012, ArXiv.

[12] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[13] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[14] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[15] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.

[16] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[17] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[18] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[19] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.

[21] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[22] E. Airoldi,et al. Asymptotic and finite-sample properties of estimators based on stochastic gradients , 2014 .

[23] P. Toulis,et al. Implicit stochastic gradient descent , 2014 .

[24] Tianbao Yang,et al. Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity , 2015 .

[25] Dimitri P. Bertsekas,et al. Incremental Aggregated Proximal and Augmented Lagrangian Algorithms , 2015, ArXiv.

[26] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[27] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.

[28] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization , 2016, ArXiv.

[29] Aaron Defazio,et al. A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.

[30] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.