Optimal Distributed Online Prediction Using Mini-Batches

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.

[1]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[2]  R. Rockafellar,et al.  On the interchange of subdifferentiation and conditional expectation for convex functionals , 1982 .

[3]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[4]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[5]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[6]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[7]  R. Wets,et al.  Stochastic programming , 1989 .

[8]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[9]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[10]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[11]  John C. Platt,et al.  Online Bayes Point Machines , 2003, PAKDD.

[12]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  R. Rockafellar,et al.  Local strong convexity and local Lipschitz continuity of the gradient of convex functions , 2007 .

[17]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[18]  Yurii Nesterov,et al.  Confidence level solutions for stochastic programming , 2000, Autom..

[19]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[20]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[21]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[22]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[23]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[24]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[25]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[26]  Antonio Alonso Ayuso,et al.  Introduction to Stochastic Programming , 2009 .

[27]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[28]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[29]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[30]  Noah A. Smith,et al.  Distributed Asynchronous Online Learning for Natural Language Processing , 2010, CoNLL.

[31]  Martin J. Wainwright,et al.  Distributed Dual Averaging In Networks , 2010, NIPS.

[32]  Renato D. C. Monteiro,et al.  Primal-dual first-order methods with $${\mathcal {O}(1/\epsilon)}$$ iteration-complexity for cone programming , 2011, Math. Program..

[33]  Guanghui Lan,et al.  Primal-dual first-order methods with O (1/e) iteration-complexity for cone programming. , 2011 .

[34]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[35]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[36]  Ohad Shamir,et al.  Optimal Distributed Online Prediction , 2011, ICML.

[37]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[38]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..