Boosting with structural sparsity: A differential inclusion approach

Abstract Boosting as gradient descent algorithms is one popular method in machine learning. In this paper a novel Boosting-type algorithm is proposed based on restricted gradient descent with structural sparsity control whose underlying dynamics are governed by differential inclusions. In particular, we present an iterative regularization path with structural sparsity where the parameter is sparse under some linear transforms, based on variable splitting and the Linearized Bregman Iteration. Hence it is called Split LBI. Despite its simplicity, Split LBI outperforms the popular generalized Lasso in both theory and experiments. A theory of path consistency is presented that equipped with a proper early stopping, Split LBI may achieve model selection consistency under a family of Irrepresentable Conditions which can be weaker than the necessary and sufficient condition for generalized Lasso. Furthermore, some l 2 error bounds are also given at the minimax optimal rates. The utility and benefit of the algorithm are illustrated by several applications including image denoising, partial order ranking of sport teams, and world university grouping with crowdsourced ranking data.

[1]  Yunzhang Zhu An Augmented ADMM Algorithm With Application to the Generalized Lasso Problem , 2017 .

[2]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery , 2010 .

[3]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[4]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[5]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[6]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[7]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[8]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[9]  Ryan J. Tibshirani,et al.  Efficient Implementations of the Generalized Lasso Dual Path Algorithm , 2014, ArXiv.

[10]  Alessandro Rinaldo,et al.  Sparsistency of the Edge Lasso over Graphs , 2012, AISTATS.

[11]  Jieping Ye,et al.  Guaranteed Sparse Recovery under Linear Transformation , 2013, ICML.

[12]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[13]  Stephen P. Boyd,et al.  An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems , 2012, 1203.1828.

[14]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Mohamed-Jalal Fadili,et al.  Robust Sparse Analysis Regularization , 2011, IEEE Transactions on Information Theory.

[17]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[18]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[19]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[20]  Jonathan E. Taylor,et al.  On model selection consistency of penalized M-estimators: a geometric theory , 2013, NIPS.

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[23]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[24]  Ryan J. Tibshirani,et al.  Fast and Flexible ADMM Algorithms for Trend Filtering , 2014, ArXiv.

[25]  Qingming Huang,et al.  Robust Statistical Ranking: Theory and Algorithms , 2014, ArXiv.

[26]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[27]  S. Osher,et al.  Sparse Recovery via Differential Inclusions , 2014, 1406.7728.

[28]  Christopher D. Brown,et al.  Receiver operating characteristics curves and related decision measures: A tutorial , 2006 .

[29]  Yuan Yao,et al.  Split LBI: An Iterative Regularization Path with Structural Sparsity , 2016, NIPS.

[30]  Xiaohui Xie,et al.  Split Bregman method for large scale fused Lasso , 2010, Comput. Stat. Data Anal..

[31]  M. Yuan,et al.  On the Nonnegative Garrote Estimator , 2005 .