论文信息 - O(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions - 字舞流文

O(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions

Traditional algorithms for stochastic optimization require projecting the solution at each iteration into a given domain to ensure its feasibility. When facing complex domains, such as the positive semidefinite cone, the projection operation can be expensive, leading to a high computational cost per iteration. In this paper, we present a novel algorithm that aims to reduce the number of projections for stochastic optimization. The proposed algorithm combines the strength of several recent developments in stochastic optimization, including mini-batches, extra-gradient, and epoch gradient descent, in order to effectively explore the smoothness and strong convexity. We show, both in expectation and with a high probability, that when the objective function is both smooth and strongly convex, the proposed algorithm achieves the optimal O(1/T) rate of convergence with only O(log T) projections. Our empirical study verifies the theoretical result.

Rong Jin | Tianbao Yang | Xiaofei He | Lijun Zhang | Tianbao Yang | Rong Jin | Xiaofei He | Lijun Zhang

[1] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[3] Xi Chen,et al. Optimal Regularized Dual Averaging Methods for Stochastic Optimization , 2012, NIPS.

[4] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[5] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[6] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[7] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[8] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[9] Ohad Shamir,et al. Optimal Distributed Online Prediction , 2011, ICML.

[10] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[11] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[12] Elad Hazan,et al. Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[13] S. Smale,et al. Geometry on Probability Spaces , 2009 .

[14] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[15] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.

[16] Jinfeng Yi,et al. Stochastic Gradient Descent with Only One Projection , 2012, NIPS.

[17] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18] James T. Kwok,et al. Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[19] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[20] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .

[21] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[22] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[23] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[24] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[25] Y. Nesterov,et al. Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[26] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.

[27] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[28] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[29] Elad Hazan,et al. Projection-free Online Learning , 2012, ICML.

[30] Boris Polyak,et al. Constrained minimization methods , 1966 .

[31] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[32] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[33] Kenneth L. Clarkson,et al. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[34] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[35] Rong Jin,et al. Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.