Network Flow Algorithms for Structured Sparsity

We consider a class of learning problems that involve a structured sparsity-inducing norm defined as the sum of l∞-norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems.

[1]  D. Goldfarb,et al.  Structured Sparsity via Alternating Direction Methods , 2011, J. Mach. Learn. Res..

[2]  G. Obozinski,et al.  Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[3]  Charles A. Micchelli,et al.  A Family of Penalty Functions for Structured Sparsity , 2010, NIPS.

[4]  Michael W. Mahoney,et al.  CUR from a Sparse Optimization Viewpoint , 2010, NIPS.

[5]  Jieping Ye,et al.  Fast Overlapping Group Lasso , 2010, ArXiv.

[6]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[7]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[8]  E. Xing,et al.  An E-cient Proximal Gradient Method for General Structured Sparse Learning , 2010 .

[9]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[10]  Yonina C. Eldar,et al.  Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[11]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[12]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[13]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[14]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[15]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[16]  Antonin Chambolle,et al.  On Total Variation Minimization and Surface Evolution Using Parametric Maximum Flows , 2009, International Journal of Computer Vision.

[17]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[19]  F. Bach,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[20]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[21]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[22]  A. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Tong Zhang,et al.  The Benefit of Group Sparsity , 2009, 0901.2962.

[24]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[25]  Volkan Cevher,et al.  Sparse Signal Recovery Using Markov Random Fields , 2008, NIPS.

[26]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[27]  Marco F. Duarte,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[28]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[29]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[30]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[31]  Michael I. Jordan,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[32]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[33]  Andrew V. Goldberg,et al.  Experimental Evaluation of a Parametric Flow Algorithm , 2006 .

[34]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[35]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[36]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[37]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[38]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[39]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[40]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[41]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[42]  Dorit S. Hochbaum,et al.  About strongly polynomial time algorithms for quadratic optimization over submodular constraints , 1995, Math. Program..

[43]  H. Groenevelt Two algorithms for maximizing a separable concave function over a polymatroid feasible region , 1991 .

[44]  R. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[45]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[46]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[47]  Zhiwei Qin,et al.  Structured Sparsity via Alternating Directions Methods , 2011, ArXiv.

[48]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[49]  廣瀬善大,et al.  双対平坦空間におけるLeast Angle Regressionと情報量規準 , 2009 .

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[52]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956, Canadian Journal of Mathematics.

[53]  Jerome M. Shapiro Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..