O(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions

Traditional algorithms for stochastic optimization require projecting the solution at each iteration into a given domain to ensure its feasibility. When facing complex domains, such as the positive semidefinite cone, the projection operation can be expensive, leading to a high computational cost per iteration. In this paper, we present a novel algorithm that aims to reduce the number of projections for stochastic optimization. The proposed algorithm combines the strength of several recent developments in stochastic optimization, including mini-batches, extra-gradient, and epoch gradient descent, in order to effectively explore the smoothness and strong convexity. We show, both in expectation and with a high probability, that when the objective function is both smooth and strongly convex, the proposed algorithm achieves the optimal O(1/T) rate of convergence with only O(log T) projections. Our empirical study verifies the theoretical result.

[1]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Xi Chen,et al.  Optimal Regularized Dual Averaging Methods for Stochastic Optimization , 2012, NIPS.

[4]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[5]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[6]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[7]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[8]  Nicolas Le Roux,et al.  Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[9]  Ohad Shamir,et al.  Optimal Distributed Online Prediction , 2011, ICML.

[10]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[11]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[12]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[13]  S. Smale,et al.  Geometry on Probability Spaces , 2009 .

[14]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[15]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[16]  Jinfeng Yi,et al.  Stochastic Gradient Descent with Only One Projection , 2012, NIPS.

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[19]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[20]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[21]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[24]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[25]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[26]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[27]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[28]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[29]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[30]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[31]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[32]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[33]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[34]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[35]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.