Stochastic Gradient Descent with Only One Projection

Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an O(1/√T) convergence rate for general convex optimization, and an O(ln T/T) rate for strongly convex optimization under mild conditions about the domain and the objective function.

[1]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[2]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[3]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[4]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[5]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[6]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[7]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[8]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[14]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[15]  Martin Jaggi,et al.  Sparse Convex Optimization Methods for Machine Learning , 2011 .

[16]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[17]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[18]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[19]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[20]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[21]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[24]  ZhangTong Sequential greedy approximation for certain convex optimization problems , 2006 .