Sparse Regression Incorporating Graphical Structure Among Predictors

ABSTRACT With the abundance of high-dimensional data in various disciplines, sparse regularized techniques are very popular these days. In this article, we make use of the structure information among predictors to improve sparse regression models. Typically, such structure information can be modeled by the connectivity of an undirected graph using all predictors as nodes of the graph. Most existing methods use this undirected graph edge-by-edge to encourage the regression coefficients of corresponding connected predictors to be similar. However, such methods do not directly use the neighborhood information of the graph. Furthermore, if there are more edges in the predictor graph, the corresponding regularization term will be more complicated. In this article, we incorporate the graph information node-by-node, instead of edge-by-edge as used in most existing methods. Our proposed method is very general and it includes adaptive Lasso, group Lasso, and ridge regression as special cases. Both theoretical and numerical studies demonstrate the effectiveness of the proposed method for simultaneous estimation, prediction, and model selection. Supplementary materials for this article are available online.

[1]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[2]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[3]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[4]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[5]  Daniel Percival Theoretical Properties of the Overlapping Groups Lasso , 2011, 1103.4614.

[6]  D. Mungas In-office mental status testing: a practical guide. , 1991, Geriatrics.

[7]  Xiaotong Shen,et al.  Simultaneous Grouping Pursuit and Feature Selection Over an Undirected Graph , 2013, Journal of the American Statistical Association.

[8]  Christos Davatzikos,et al.  Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: Results from ADNI , 2009, NeuroImage.

[9]  Wei Pan,et al.  Network‐Based Penalized Regression With Application to Genomic Data , 2013, Biometrics.

[10]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[13]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[15]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[19]  Ron Wehrens,et al.  The pls Package , 2007 .

[20]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[21]  Lorenzo Rosasco,et al.  Proximal methods for the latent group lasso penalty , 2012, Computational Optimization and Applications.

[22]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[23]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[24]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[25]  S. Folstein,et al.  "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. , 1975, Journal of psychiatric research.

[26]  E. Greenshtein Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[27]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[28]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[29]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[30]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[31]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[32]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[34]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[35]  Wei Pan,et al.  Predictor Network in Penalized Regression with Application to Microarray Data” , 2009 .

[36]  L. Beckett,et al.  Annual Incidence of Alzheimer Disease in the United States Projected to the Years 2000 Through 2050 , 2001, Alzheimer disease and associated disorders.

[37]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[38]  Heinz H. Bauschke,et al.  Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[39]  Genevera I. Allen,et al.  Molecular pathway identification using biological network-regularized logistic models , 2013, BMC Genomics.

[40]  A. Rinaldo,et al.  On the asymptotic properties of the group lasso estimator for linear models , 2008 .

[41]  Daoqiang Zhang,et al.  Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease , 2012, NeuroImage.

[42]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[43]  C. Jack,et al.  Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment , 1999, Neurology.

[44]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[45]  S. Geer,et al.  Adaptive Lasso for High Dimensional Regression and Gaussian Graphical Modeling , 2009, 0903.2515.