Group variable selection via convex log‐exp‐sum penalty with application to a breast cancer survivor study

In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non‐convex. Such a non‐convexity may require extra numerical effort. In this article, we propose a novel Log‐Exp‐Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group‐level coordinate descent algorithm to fit the model. We also derive non‐asymptotic error bounds and asymptotic group selection consistency for our method in the high‐dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.

[1]  R. Pearl Biometrics , 1914, The American Naturalist.

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[4]  P. Converse,et al.  The Quality of American Life: Perceptions, Evaluations, and Satisfactions , 1976 .

[5]  K. Motamedi The quality of American life , 1976 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[8]  G. Wahba Bayesian "Confidence Intervals" for the Cross-validated Smoothing Spline , 1983 .

[9]  S. Folkman,et al.  Journal of Personality and Social Psychology If It Changes It Must Be a Process: Study of Emotion and Coping during Three Stages of a College Examination , 2022 .

[10]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[11]  S. Folkman,et al.  [An analysis of coping in a middle-aged community sample]. , 1980, Kango kenkyu. The Japanese journal of nursing research.

[12]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[13]  A. Girard A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data , 1989 .

[14]  J. A. Holmes,et al.  Differential effects of avoidant and attentional coping strategies on adaptation to chronic and recent-onset pain. , 1990, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[15]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[16]  Feng Gao,et al.  Adaptive Tuning of Numerical Weather Prediction Models: Randomized GCV in Three- and Four-Dimensional Data Assimilation , 1995 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  M. Frank-Stromborg,et al.  Instruments for clinical health-care research , 1997 .

[19]  M. Parle,et al.  The development of a training model to improve health professionals' skills, self-efficacy and outcome expectancies when communicating with cancer patients. , 1997, Social science & medicine.

[20]  Xiwu Lin,et al.  Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV , 2000 .

[21]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[23]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[24]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[25]  D. Berry,et al.  Effect of screening and adjuvant therapy on mortality from breast cancer , 2005 .

[26]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[27]  T. L. Gall,et al.  Understanding the Nature and Role of Spirituality in Relation to Coping and Health: A Conceptual Framework. , 2005 .

[28]  N. Schneiderman,et al.  Reductions in depressed mood and denial coping during cognitive behavioral stress management with hiv-positive gay men treated with haart , 2006, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[29]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[30]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[31]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[32]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[33]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[34]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[35]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[36]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[37]  A. Rinaldo,et al.  On the asymptotic properties of the group lasso estimator for linear models , 2008 .

[38]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[39]  B. Andersen,et al.  Religious Practice and Spirituality in the Psychological Adjustment of Survivors of Breast Cancer. , 2009, Counseling and values.

[40]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[41]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[42]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[43]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[44]  Ji Zhu,et al.  A ug 2 01 0 Group Variable Selection via a Hierarchical Lasso and Its Oracle Property Nengfeng Zhou Consumer Credit Risk Solutions Bank of America Charlotte , NC 28255 , 2010 .

[45]  D. Zwahlen,et al.  Posttraumatic growth in cancer patients and partners—effects of role, gender and the dyad on couples' posttraumatic growth experience , 2010, Psycho-oncology.

[46]  C. Johansen,et al.  Self-efficacy, adjustment style and well-being in breast cancer patients: a longitudinal study , 2010, Quality of Life Research.

[47]  Xihong Lin,et al.  VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L0 PENALTY , 2011 .

[48]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[49]  Xihong Lin,et al.  Variable selection and estimation with the seamless-L0 penalty models , 2012 .

[50]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .