Convex Optimization without Projection Steps

For the general problem of minimizing a convex function over a compact convex domain, we will investigate a simple iterative approximation algorithm based on the method by Frank & Wolfe 1956, that does not need projection steps in order to stay inside the optimization domain. Instead of a projection step, the linearized problem defined by a current subgradient is solved, which gives a step direction that will naturally stay in the domain. Our framework generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to arbitrary convex domains. We give a convergence proof guaranteeing {\epsilon}-small duality gap after O(1/{\epsilon}) iterations. The method allows us to understand the sparsity of approximate solutions for any l1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality. We obtain matching upper and lower bounds of {\Theta}(1/{\epsilon}) for the sparsity for l1-problems. The same bounds apply to low-rank semidefinite optimization with bounded trace, showing that rank O(1/{\epsilon}) is best possible here as well. As another application, we obtain sparse matrices of O(1/{\epsilon}) non-zero entries as {\epsilon}-approximate solutions when optimizing any convex function over a class of diagonally dominant symmetric matrices. We show that our proposed first-order method also applies to nuclear norm and max-norm matrix optimization problems. For nuclear norm regularized optimization, such as matrix completion and low-rank recovery, we demonstrate the practical efficiency and scalability of our algorithm for large matrix problems, as e.g. the Netflix dataset. For general convex optimization over bounded matrix max-norm, our algorithm is the first with a convergence guarantee, to the best of our knowledge.

[1]  Satyen Kale Efficient algorithms using the multiplicative weights update method , 2007 .

[2]  J. Dunn Convergence Rates for Conditional Gradient Sequences Generated by Implicit Step Length Rules , 1980 .

[3]  N. S. Aybat,et al.  Fast First-Order Methods for Stable Principal Component Pursuit , 2011, 1105.2126.

[4]  Yong-Jin Liu,et al.  An implementable proximal point algorithmic framework for nuclear norm minimization , 2012, Math. Program..

[5]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[6]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[7]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[8]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[9]  M. Zabarankin,et al.  Convex functional analysis , 2005 .

[10]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[11]  Benjamin Recht,et al.  Probability of unique integer solution to a system of linear equations , 2011, Eur. J. Oper. Res..

[12]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[13]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[14]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[15]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[16]  M. Baes,et al.  Smoothing techniques for solving semidefinite programs with many constraints , 2009 .

[17]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[18]  Tong Zhang,et al.  Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints , 2010, SIAM J. Optim..

[19]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[20]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[21]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22]  Stefan Schaal,et al.  Fast, robust quadruped locomotion over challenging terrain , 2010, 2010 IEEE International Conference on Robotics and Automation.

[23]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[24]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[25]  Martin Jaggi,et al.  Sparse Convex Optimization Methods for Machine Learning , 2011 .

[26]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[27]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[28]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[29]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[30]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[31]  Kenneth L. Clarkson,et al.  Optimal core-sets for balls , 2008, Comput. Geom..

[32]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[33]  M. Patriksson Partial linearization methods in nonlinear programming , 1993 .

[34]  L. Ghaoui,et al.  Sparse PCA: Convex Relaxations, Algorithms and Applications , 2010, 1011.3781.

[35]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[36]  Arkadi Nemirovski,et al.  Non-euclidean restricted memory level method for large-scale convex optimization , 2005, Math. Program..

[37]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[38]  Bernd Gärtner,et al.  Understanding and using linear programming , 2007, Universitext.

[39]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[40]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[41]  Ruslan Salakhutdinov,et al.  Practical Large-Scale Optimization for Max-norm Regularization , 2010, NIPS.

[42]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[43]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[44]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[45]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[46]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[47]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[48]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[49]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[50]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[51]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[52]  Michael J. Todd,et al.  On Khachiyan's algorithm for the computation of minimum-volume enclosing ellipsoids , 2007, Discret. Appl. Math..

[53]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[54]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[57]  Nathan Linial,et al.  Complexity measures of sign matrices , 2007, Comb..

[58]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[59]  Martin Jaggi,et al.  Regularization Paths with Guarantees for Convex Semidefinite Optimization , 2012, AISTATS.

[60]  Tong Zhang,et al.  Sparse Recovery With Orthogonal Matching Pursuit Under RIP , 2010, IEEE Transactions on Information Theory.

[61]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[62]  Arvind Ganesh,et al.  Fast algorithms for recovering a corrupted low-rank matrix , 2009, 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[63]  Nikhil Srivastava,et al.  Twice-ramanujan sparsifiers , 2008, STOC '09.

[64]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[65]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[66]  Sanjeev Arora,et al.  Fast algorithms for approximate semidefinite programming using the multiplicative weights update method , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[67]  Alexander J. Smola,et al.  Improving maximum margin matrix factorization , 2008, Machine Learning.

[68]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[69]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[70]  Martin Jaggi,et al.  Approximating Parameterized Convex Optimization Problems , 2010, ESA.

[71]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[72]  Ruslan Salakhutdinov,et al.  Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[73]  Henryk Wozniakowski,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..

[74]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[75]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[76]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[77]  Bernd Grtner,et al.  Approximation Algorithms and Semidefinite Programming , 2012 .

[78]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[79]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[80]  Nicolas Gillis,et al.  Low-Rank Matrix Approximation with Weights or Missing Data Is NP-Hard , 2010, SIAM J. Matrix Anal. Appl..

[81]  Michael Patriksson,et al.  Cost Approximation: A Unified Framework of Descent Algorithms for Nonlinear Programs , 1998, SIAM J. Optim..

[82]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[83]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[84]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[85]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[86]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[87]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[88]  András A. Benczúr,et al.  Methods for large scale SVD with missing values , 2007 .

[89]  M. Wu,et al.  Collaborative Filtering via Ensembles of Matrix Factorizations , 2007, KDD 2007.

[90]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[91]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[92]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[93]  Martin Jaggi,et al.  Coresets for polytope distance , 2009, SCG '09.

[94]  E. Gilbert An Iterative Procedure for Computing the Minimum of a Quadratic Form on a Convex Set , 1966 .

[95]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[96]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[97]  Aharon Ben-Tal,et al.  Lectures on modern convex optimization , 1987 .

[98]  Hsueh-I Lu,et al.  Efficient approximation algorithms for semidefinite programs arising from MAX CUT and COLORING , 1996, STOC '96.

[99]  Pankaj K. Agarwal,et al.  Approximation Algorithms for k-Line Center , 2002, ESA.

[100]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[101]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.