Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

We suggest a general oracle-based framework that captures parallel stochastic optimization in different parallelization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds to study several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the ``natural'' algorithms are not known to be optimal.

[1]  E. Slud Distribution Inequalities for the Binomial Law , 1977 .

[2]  M. Todd,et al.  The Ellipsoid Method: A Survey , 1980 .

[3]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[4]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[5]  Arkadi Nemirovski,et al.  On Parallel Complexity of Nonsmooth Convex Optimization , 1994, J. Complex..

[6]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[7]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[8]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[9]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[10]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[11]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[12]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[13]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[14]  Matthew J. Streeter,et al.  Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.

[15]  Tengyu Ma,et al.  On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.

[16]  Hamid Reza Feyzmahdavian,et al.  An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[17]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[18]  Cristobal Guzman,et al.  On lower complexity bounds for large-scale smooth convex optimization , 2013, J. Complex..

[19]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[20]  Alexander J. Smola,et al.  AdaDelay: Delay Adaptive Distributed Stochastic Optimization , 2016, AISTATS.

[21]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[22]  Tianbao Yang,et al.  Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement , 2017 .

[23]  Blake E. Woodworth,et al.  Lower Bound for Randomized First Order Convex Optimization , 2017, 1709.03594.

[24]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[25]  Nathan Srebro,et al.  Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox , 2017, COLT.

[26]  Feng Ruan,et al.  Minimax Bounds on Stochastic Batched Convex Optimization , 2018, COLT.

[27]  Naman Agarwal,et al.  Lower Bounds for Higher-Order Convex Optimization , 2017, COLT.

[28]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[29]  Ohad Shamir,et al.  A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates , 2018, ALT.