Stochastic Conditional Gradient Method for Composite Convex Minimization

In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

[1]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[2]  Donald Goldfarb,et al.  Linear Convergence of Stochastic Frank Wolfe Variants , 2017, AISTATS.

[3]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[4]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[5]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[6]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[7]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[8]  Jianqing Fan,et al.  An Overview of the Estimation of Large Covariance and Precision Matrices , 2015, The Econometrics Journal.

[9]  Gauthier Gidel,et al.  Frank-Wolfe Splitting via Augmented Lagrangian Method , 2018, AISTATS.

[10]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[11]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[12]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[13]  Dustin G. Mixon,et al.  Clustering subgaussian mixtures by semidefinite programming , 2016, ArXiv.

[14]  Zebang Shen,et al.  Complexities in Projection-Free Stochastic Non-convex Minimization , 2019, AISTATS.

[15]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[16]  Martin Jaggi,et al.  A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe , 2017, AISTATS.

[17]  Volkan Cevher,et al.  A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming , 2018, ICML.

[18]  Volkan Cevher,et al.  Stochastic Three-Composite Convex Minimization , 2017, NIPS.

[19]  Gunnar Rätsch,et al.  Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees , 2017, NIPS.

[20]  Volkan Cevher,et al.  A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization , 2015, SIAM J. Optim..

[21]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[22]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[23]  Gunnar Rätsch,et al.  On Matching Pursuit and Coordinate Descent , 2018, ICML 2018.

[24]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[25]  Nicolas Vayatis,et al.  Estimation of Simultaneously Sparse and Low Rank Matrices , 2012, ICML.

[26]  Javad Lavaei,et al.  Convex Relaxation for Optimal Power Flow Problem: Mesh Networks , 2015, IEEE Transactions on Power Systems.

[27]  Volkan Cevher,et al.  A Conditional-Gradient-Based Augmented Lagrangian Framework , 2019, ICML.

[28]  Yi Zhou,et al.  Conditional Accelerated Lazy Stochastic Gradient Descent , 2017, ICML.

[29]  Haihao Lu,et al.  Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[30]  Guanghui Lan The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , 2013, 1309.5550.

[31]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[32]  Jiming Peng,et al.  Advanced Optimization Laboratory Title : Approximating K-means-type clustering via semidefinite programming , 2005 .

[33]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[34]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[35]  Martin Jaggi,et al.  Primal-Dual Rates and Certificates , 2016, ICML.

[36]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[37]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[39]  Dan Garber,et al.  Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems , 2018, AISTATS.

[40]  Volkan Cevher,et al.  Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator , 2019, ICML.

[41]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[42]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.