Alternating linearization for structured regularization problems

We adapt the alternating linearization method for proximal decomposition to structured regularization problems, in particular, to the generalized lasso problems. The method is related to two well-known operator splitting methods, the Douglas-Rachford and the Peaceman-Rachford method, but it has descent properties with respect to the objective function. This is achieved by employing a special update test, which decides whether it is beneficial to make a Peaceman-Rachford step, any of the two possible Douglas-Rachford steps, or none. The convergence mechanism of the method is related to that of bundle methods of nonsmooth optimization. We also discuss implementation for very large problems, with the use of specialized algorithms and sparse data structures. Finally, we present numerical results for several synthetic and real-world examples, including a three-dimensional fused lasso problem, which illustrate the scalability, efficacy, and accuracy of the method.

[1]  José M. Bioucas-Dias,et al.  Fast Image Recovery Using Variable Splitting and Constrained Optimization , 2009, IEEE Transactions on Image Processing.

[2]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[4]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[5]  Kenneth Lange,et al.  Reconstructing DNA copy number by joint segmentation of multiple sequences , 2012, BMC Bioinformatics.

[6]  Jonathan E. Taylor,et al.  Interpretable whole-brain prediction analysis with GraphNet , 2013, NeuroImage.

[7]  Ana Friedlander,et al.  On the Maximization of a Concave Quadratic Function with Box Constraints , 1994, SIAM J. Optim..

[8]  Vincent Schmithorst,et al.  Cognitive modules utilized for narrative comprehension in children: a functional magnetic resonance imaging study , 2006, NeuroImage.

[9]  Mohamed-Jalal Fadili,et al.  Total Variation Projection With First Order Schemes , 2011, IEEE Transactions on Image Processing.

[10]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[11]  Donald Goldfarb,et al.  2 A Variable-Splitting Augmented Lagrangian Framework , 2011 .

[12]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[13]  Shiqian Ma,et al.  Fast alternating linearization methods for minimizing the sum of two convex functions , 2009, Math. Program..

[14]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[15]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[16]  Xi Chen,et al.  Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[17]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[18]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[19]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[20]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[21]  Jonathan Eckstein Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results , 2012 .

[22]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[23]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[24]  Jonathan Taylor,et al.  A family of interpretable multivariate models for regression and classification of whole-brain fMRI data , 2011 .

[25]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[26]  Vincent Schmithorst,et al.  A group independent component analysis of covert verb generation in children: A functional magnetic resonance imaging study , 2010, NeuroImage.

[27]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[28]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[29]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[30]  Gaël Varoquaux,et al.  Total Variation Regularization for fMRI-Based Prediction of Behavior , 2011, IEEE Transactions on Medical Imaging.

[31]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[32]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[33]  Shiqian Ma,et al.  An alternating direction method for total variation denoising , 2011, Optim. Methods Softw..

[34]  Andrzej Ruszczynski,et al.  On Convergence of an Augmented Lagrangian Decomposition Method for Sparse Convex Optimization , 1995, Math. Oper. Res..

[35]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[36]  Hua Zhou,et al.  A Generic Path Algorithm for Regularized Statistical Estimation , 2012, Journal of the American Statistical Association.

[37]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[38]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[39]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[40]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[41]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[42]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[43]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[44]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[45]  Yin Zhang,et al.  An efficient augmented Lagrangian method with applications to total variation minimization , 2013, Computational Optimization and Applications.

[46]  D. Pinkel,et al.  Regional copy number–independent deregulation of transcription in cancer , 2006, Nature Genetics.

[47]  Xiaoming Yuan,et al.  On the O(1/t) Convergence Rate of Alternating Direction Method with Logarithmic-Quadratic Proximal Regularization , 2012, SIAM J. Optim..

[48]  P. L. Combettes Iterative construction of the resolvent of a sum of maximal monotone operators , 2009 .

[49]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[50]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[51]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[52]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[53]  Andrzej Ruszczynski,et al.  Proximal Decomposition Via Alternating Linearization , 1999, SIAM J. Optim..

[54]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[55]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[56]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[57]  Xiaohui Xie,et al.  Split Bregman method for large scale fused Lasso , 2010, Comput. Stat. Data Anal..

[58]  Emmanuel Barillot,et al.  Classification of arrayCGH data using fused SVM , 2008, ISMB.

[59]  Ying Xiong Nonlinear Optimization , 2014 .

[60]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[61]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[62]  Junfeng Yang,et al.  A New Alternating Minimization Algorithm for Total Variation Image Reconstruction , 2008, SIAM J. Imaging Sci..

[63]  José Mario Martínez,et al.  Large-Scale Active-Set Box-Constrained Optimization Method with Spectral Projected Gradients , 2002, Comput. Optim. Appl..

[64]  E. Xing,et al.  An E-cient Proximal Gradient Method for General Structured Sparse Learning , 2010 .

[65]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[66]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[67]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[68]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[69]  Stephen P. Boyd,et al.  An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems , 2012, 1203.1828.

[70]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[71]  K. Kiwiel Methods of Descent for Nondifferentiable Optimization , 1985 .