Fused Lasso Screening Rules via the Monotonicity of Subdifferentials

Fused Lasso is a popular regression technique that encodes the smoothness of the data. It has been applied successfully to many applications with a smooth feature structure. However, the computational cost of the existing solvers for fused Lasso is prohibitive when the feature dimension is extremely large. In this paper, we propose novel screening rules that are able to quickly identity the adjacent features with the same coefficients. As a result, the number of variables to be estimated can be significantly reduced, leading to substantial savings in computational cost and memory usage. To the best of our knowledge, the proposed approach is the first attempt to develop screening methods for the fused Lasso problem with general data matrix. Our major contributions are: 1) we derive a new dual formulation of fused Lasso that comes with several desirable properties; 2) we show that the new dual formulation of fused Lasso is equivalent to that of the standard Lasso by two affine transformations; 3) we propose a novel framework for developing effective and efficient screening rules for fused La sso via the monotonicity of the subdifferentials (FLAMS). Some appealing features of FLAMS are: 1) our methods are safe in the sense that the detected adjacent features are guaranteed to have the same coefficients; 2) the dataset needs to be scanned only once to run the screening, whose computational cost is negligible compared to that of solving the fused Lasso; (3) FLAMS is independent of the solvers and can be integrated with any existing solvers. We have evaluated the proposed FLAMS rules on both synthetic and real datasets. The experiments indicate that FLAMS is very effective in identifying the adjacent features with the same coefficients. The speedup gained by FLAMS can be orders of magnitude.

[1]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[2]  Hao Xu,et al.  Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries , 2011, NIPS.

[3]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[4]  Laurent Condat,et al.  A Direct Algorithm for 1-D Total Variation Denoising , 2013, IEEE Signal Processing Letters.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Jieping Ye,et al.  Scaling SVM and Least Absolute Deviations via Exact Data Reduction , 2013, ICML.

[7]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[8]  A. Rinaldo Properties and refinements of the fused lasso , 2008, 0805.0234.

[9]  Emmanuel Barillot,et al.  Classification of arrayCGH data using fused SVM , 2008, ISMB.

[10]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[11]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[12]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[13]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[14]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[15]  Douglass J. Wilde,et al.  Foundations of Optimization. , 1967 .

[16]  Jiayu Zhou,et al.  A Safe Screening Rule for Sparse Logistic Regression , 2013, NIPS.

[17]  柳井 晴夫,et al.  Projection matrices, generalized inverse matrices, and singular value decomposition , 2011 .

[18]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[19]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[20]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[21]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[22]  Ichiro Takeuchi,et al.  Safe Screening of Non-Support Vectors in Pathwise SVM Computation , 2013, ICML.

[23]  Jieping Ye,et al.  Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets , 2014, NIPS.

[24]  Osman Güler,et al.  Foundations of Optimization , 2010 .

[25]  R. Tyrrell Rockafellar Conjugate Duality and Optimization , 1974 .

[26]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[27]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[28]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[29]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.