BILGO: Bilateral greedy optimization for large scale semidefinite programming

Many machine learning tasks (e.g. metric and manifold learning problems) can be formulated as convex semidefinite programs. To enable the application of these tasks on a large-scale, scalability and computational efficiency are considered as desirable properties for a practical semidefinite programming algorithm. In this paper, we theoretically analyze a new bilateral greedy optimization (denoted BILGO) strategy in solving general semidefinite programs on large-scale datasets. As compared to existing methods, BILGO employs a bilateral search strategy during each optimization iteration. In such an iteration, the current semidefinite matrix solution is updated as a bilateral linear combination of the previous solution and a suitable rank-1 matrix, which can be efficiently computed from the leading eigenvector of the descent direction at this iteration. By optimizing for the coefficients of the bilateral combination, BILGO reduces the cost function in every iteration until the KKT conditions are fully satisfied, thus, it tends to converge to a global optimum. In fact, we prove that BILGO converges to the global optimal solution at a rate of O(1/k), where k is the iteration counter. The algorithm thus successfully combines the efficiency of conventional rank-1 update algorithms and the effectiveness of gradient descent. Moreover, BILGO can be easily extended to handle low rank constraints. To validate the effectiveness and efficiency of BILGO, we apply it to two important machine learning tasks, namely Mahalanobis metric learning and maximum variance unfolding. Extensive experimental results clearly demonstrate that BILGO can solve large-scale semidefinite programs efficiently.

[1]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[2]  Xiaowei Yang,et al.  A Linear Support Higher-Order Tensor Machine for Classification , 2013, IEEE Transactions on Image Processing.

[3]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[4]  Bernard Ghanem,et al.  Low-rank quadratic semidefinite programming , 2013, Neurocomputing.

[5]  Sören Laue A Hybrid Algorithm for Convex Semidefinite Optimization , 2012, ICML.

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[8]  J. Dunn Newton’s Method and the Goldstein Step-Length Rule for Constrained Minimization Problems , 1980 .

[9]  Katya Scheinberg,et al.  Block Coordinate Descent Methods for Semidefinite Programming , 2012 .

[10]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[13]  Shuo-Yen Robert Li,et al.  Fast Graph Laplacian Regularized Kernel Learning via Semidefinite-Quadratic-Linear Programming , 2009, NIPS.

[14]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[15]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[16]  Rong Jin,et al.  An Information Geometry Approach for Distance Metric Learning , 2009, AISTATS.

[17]  Silvere Bonnabel,et al.  Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[18]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[19]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[20]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[21]  Bamdev Mishra,et al.  Low-rank optimization for distance matrix completion , 2011, IEEE Conference on Decision and Control and European Control Conference.

[22]  Danny C. Sorensen,et al.  Implicit Application of Polynomial Filters in a k-Step Arnoldi Method , 1992, SIAM J. Matrix Anal. Appl..

[23]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[24]  Michael I. Jordan,et al.  Random Conic Pursuit for Semidefinite Programming , 2010, NIPS.

[25]  Peng Sun,et al.  Linear convergence of a modified Frank–Wolfe algorithm for computing minimum-volume enclosing ellipsoids , 2008, Optim. Methods Softw..

[26]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[27]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[28]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[29]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[30]  Stefan Vandewalle,et al.  A Riemannian Optimization Approach for Computing Low-Rank Solutions of Lyapunov Equations , 2010, SIAM J. Matrix Anal. Appl..

[31]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[32]  M. Saunders,et al.  Solution of Sparse Indefinite Systems of Linear Equations , 1975 .

[33]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[34]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[35]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.