Network Flow Algorithms for Structured Sparsity

We consider a class of learning problems that involve a structured sparsity-inducing norm defined as the sum of l∞-norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems.

[1]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[2]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[3]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[4]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[5]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[6]  H. Groenevelt Two algorithms for maximizing a separable concave function over a polymatroid feasible region , 1991 .

[7]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[8]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[9]  Dorit S. Hochbaum,et al.  About strongly polynomial time algorithms for quadratic optimization over submodular constraints , 1995, Math. Program..

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[12]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[15]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[18]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[19]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[21]  Andrew V. Goldberg,et al.  Experimental Evaluation of a Parametric Flow Algorithm , 2006 .

[22]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[23]  Volkan Cevher,et al.  Sparse Signal Recovery Using Markov Random Fields , 2008, NIPS.

[24]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[25]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[26]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[27]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[28]  Marc'Aurelio Ranzato,et al.  Learning invariant features through topographic filter maps , 2009, CVPR.

[29]  Antonin Chambolle,et al.  On Total Variation Minimization and Surface Evolution Using Parametric Maximum Flows , 2009, International Journal of Computer Vision.

[30]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[31]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[32]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[33]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[34]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[35]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[36]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[37]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[39]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[40]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[41]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[42]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[43]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[44]  Yonina C. Eldar,et al.  Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[45]  E. Xing,et al.  An E-cient Proximal Gradient Method for General Structured Sparse Learning , 2010 .

[46]  Charles A. Micchelli,et al.  A Family of Penalty Functions for Structured Sparsity , 2010, NIPS.

[47]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[48]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[49]  Jieping Ye,et al.  Fast Overlapping Group Lasso , 2010, ArXiv.

[50]  Michael W. Mahoney,et al.  CUR from a Sparse Optimization Viewpoint , 2010, NIPS.

[51]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[52]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[53]  Julien Mairal,et al.  Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[54]  Donald Goldfarb,et al.  2 A Variable-Splitting Augmented Lagrangian Framework , 2011 .