Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs

We develop an algorithm that incorporates network information into regression settings. It simultaneously estimates the covariate coefficients and the signs of the network connections (i.e. whether the connections are of an activating or of a repressing type). For the coefficient estimation steps an additional penalty is set on top of the lasso penalty, similarly to Li and Li (2008). We develop a fast implementation for the new method based on coordinate descent. Furthermore, we show how the new methods can be applied to time-to-event data. The new method yields good results in simulation studies concerning sensitivity and specificity of non-zero covariate coefficients, estimation of network connection signs, and prediction performance. We also apply the new method to two microarray time-to-event data sets from patients with ovarian cancer and diffuse large B-cell lymphoma. The new method performs very well in both cases. The main application of this new method is of biomedical nature, but it may also be useful in other fields where network data is available.

[1]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[2]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[3]  Thomas A Gerds,et al.  Efron‐Type Measures of Prediction Error for Survival Analysis , 2007, Biometrics.

[4]  Harald Binder,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[5]  B. Efron The Efficiency of Cox's Likelihood Function for Censored Data , 1977 .

[6]  Decision Systems.,et al.  Coordinate ascent for maximizing nondifferentiable concave functions , 1988 .

[7]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[8]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[9]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[11]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[14]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[15]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[16]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  N. Breslow,et al.  Analysis of Survival Data under the Proportional Hazards Model , 1975 .

[19]  Anne Lohrli Chapman and Hall , 1985 .

[20]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[21]  D.,et al.  Regression Models and Life-Tables , 2022 .

[22]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[25]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[26]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[27]  Harald Binder,et al.  Bioinformatics Applications Note Parallelized Prediction Error Estimation for Evaluation of High-dimensional Models , 2022 .

[28]  Harald Binder,et al.  A general, prediction error‐based criterion for selecting model complexity for high‐dimensional survival models , 2010, Statistics in medicine.

[29]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[30]  Harald Binder,et al.  Assessment of survival prediction models based on microarray data , 2007, Bioinform..

[31]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[32]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[33]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[34]  Schumacher Martin,et al.  Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples , 2008 .

[35]  Guido Schwarzer,et al.  Easier parallel computing in R with snowfall and sfCluster , 2009, R J..

[36]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[37]  Harald Binder,et al.  Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples , 2008, Statistical applications in genetics and molecular biology.