Thresholding-based Iterative Selection Procedures for Generalized Linear Models

High-dimensional data pose challenges in statistical learning and modeling. Sometimes the predictors can be naturally grouped where pursuing the between-group sparsity is desired. Collinearity may occur in real-world high-dimensional applications where the popular l"1 technique suffers from both selection inconsistency and prediction inaccuracy. Moreover, the problems of interest often go beyond Gaussian models. To meet these challenges, nonconvex penalized generalized linear models with grouped predictors are investigated and a simple-to-implement algorithm is proposed for computation. A rigorous theoretical result guarantees its convergence and provides tight preliminary scaling. This framework allows for grouped predictors and nonconvex penalties, including the discrete l"0 and the 'l"0+l"2' type penalties. Penalty design and parameter tuning for nonconvex penalties are examined. Applications of super-resolution spectrum estimation in signal processing and cancer classification with joint gene selection in bioinformatics show the performance improvement by nonconvex penalized estimation.

[1]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[3]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[4]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[5]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[6]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[7]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[10]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[11]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[12]  Jianqing Fan,et al.  Regularization of Wavelets Approximations , 2011 .

[13]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[14]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[15]  Anestis Antoniadis,et al.  Wavelet methods in statistics: Some recent developments and their applications , 2007, 0712.0283.

[16]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[17]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[20]  V. Yohai,et al.  A Fast Procedure for Outlier Diagnostics in Large Regression Problems , 1999 .

[21]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[22]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[23]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[24]  Y. She,et al.  Thresholding-based iterative selection procedures for model selection and shrinkage , 2008, 0812.5061.

[25]  Donald Geman,et al.  Constrained Restoration and the Recovery of Discontinuities , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[27]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[28]  Jianwen Cai,et al.  Hazard models with varying coefficients for multivariate failure time data , 2007, 0708.0519.

[29]  Christophe Croux,et al.  Implementing the Bianco and Yohai estimator for logistic regression , 2003, Comput. Stat. Data Anal..

[30]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[34]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[35]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[36]  F. Browder,et al.  Construction of fixed points of nonlinear mappings in Hilbert space , 1967 .

[37]  B. Efron Empirical Bayes Estimates for Large-Scale Prediction Problems , 2009, Journal of the American Statistical Association.

[38]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[39]  Hongzhe Li,et al.  Group SCAD regression analysis for microarray time course gene expression data , 2007, Bioinform..

[40]  D. Scholtens,et al.  Analysis of Differential Gene Expression Studies , 2005 .

[41]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[42]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[43]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[44]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[45]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[46]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[47]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[48]  Y. She Sparse regression with exact clustering , 2008 .

[49]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[50]  A. Bruce,et al.  WAVESHRINK WITH FIRM SHRINKAGE , 1997 .

[51]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[52]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[53]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[54]  Adrian Barbu,et al.  Dimension reduction and variable selection in case control studies via regularized likelihood optimization , 2009, 0905.2171.

[55]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .