Fast column generation for atomic norm regularization

We consider optimization problems that consist in minimizing a quadratic function under an atomic norm regularization or constraint. In the line of work on conditional gradient algorithms, we show that the fully corrective Frank-Wolfe (FCFW) algorithm — which is most naturally reformulated as a column generation algorithm in the regularized case — can be made particularly efficient for difficult problems in this family by solving the simplicial or conical subproblems produced by FCFW using a special instance of a classical active set algorithm for quadratic programming (Nocedal and Wright, 2006) that generalizes the min-norm point algorithm (Wolfe, 1976). Our experiments show that the algorithm takes advantages of warm-starts and of the sparsity induced by the norm, displays fast linear convergence, and clearly outperforms the state-of-the-art, for both complex and classical norms, including the standard group Lasso.

[1]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[2]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[3]  Massimiliano Pontil,et al.  Structured Sparsity and Generalization , 2011, J. Mach. Learn. Res..

[4]  Jean-Philippe Vert,et al.  Tight convex relaxations for sparse matrix factorization , 2014, NIPS.

[5]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yurii Nesterov,et al.  Complexity bounds for primal-dual methods minimizing the model of objective function , 2017, Mathematical Programming.

[7]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[8]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[9]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[10]  Ruslan Salakhutdinov,et al.  Matrix reconstruction with the local max norm , 2012, NIPS.

[11]  Stephen J. Wright,et al.  Forward–Backward Greedy Algorithms for Atomic Norm Regularization , 2014, IEEE Transactions on Signal Processing.

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  Anders Forsgren,et al.  Primal and dual active-set methods for convex quadratic programming , 2015, Mathematical Programming.

[14]  Dirk A. Lorenz,et al.  A generalized conditional gradient method and its connection to an iterative shrinkage method , 2009, Comput. Optim. Appl..

[15]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[16]  Yaoliang Yu,et al.  Accelerated Training for Matrix-norm Regularization: A Boosting Approach , 2012, NIPS.

[17]  Francis R. Bach,et al.  Duality Between Subgradient and Conditional Gradient Methods , 2012, SIAM J. Optim..

[18]  G. Obozinski,et al.  A unified perspective on convex structured sparsity: Hierarchical, symmetric, submodular norms and beyond , 2016 .

[19]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[20]  Zhaoran Wang,et al.  Low-Rank and Sparse Structure Pursuit via Alternating Minimization , 2016, AISTATS.

[21]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[22]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[23]  Katya Scheinberg,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[24]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[25]  Masashi Sugiyama,et al.  Multitask learning meets tensor factorization: task imputation via convex optimization , 2014, NIPS.

[26]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[27]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[28]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Johan A. K. Suykens,et al.  Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization , 2014, ArXiv.

[30]  Taiji Suzuki,et al.  Convex Tensor Decomposition via Structured Schatten Norm Regularization , 2013, NIPS.

[31]  WonkaPeter,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2013 .

[32]  Y. She,et al.  Group Regularized Estimation Under Structural Hierarchy , 2014, 1411.4691.

[33]  Xiaohan Yan,et al.  Hierarchical Sparse Modeling: A Choice of Two Regularizers , 2015 .

[34]  Philip Wolfe,et al.  Finding the nearest point in A polytope , 1976, Math. Program..

[35]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[36]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[37]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[38]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.