Gene selection in microarray survival studies under possibly non-proportional hazards

MOTIVATION Univariate Cox regression (COX) is often used to select genes possibly linked to survival. With non-proportional hazards (NPH), COX could lead to under- or over-estimation of effects. The effect size measure c=P(T(1)<T(0)), i.e. the probability that a person randomly chosen from group G(1) dies earlier than a person from G(0), is independent of the proportional hazards (PH) assumption. Here we consider its generalization to continuous data c' and investigate the suitability of c' for gene selection. RESULTS Under PH, c' is most efficiently estimated by COX. Under NPH, c' can be obtained by weighted Cox regression (WHE) or a novel method, concordance regression (CON). The least biased and most stable estimates were obtained by CON. We propose to use c' as summary measure of effect size to rank genes irrespective of different types of NPH and censoring patterns. AVAILABILITY WHE and CON are available as R packages. CONTACT georg.heinze@meduniwien.ac.at SUPPLEMENTARY INFORMATION Supplementary Data are available at Bioinformatics online.

[1]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[2]  Michal Abrahamowicz,et al.  Marginal and hazard ratio specific random data generation: Applications to semi-parametric bootstrapping , 2002, Stat. Comput..

[3]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[4]  Mounir Mesbah,et al.  A Global Goodness‐Of‐Fit Statistic for the Proportional Hazards Model , 1985 .

[5]  David Collett Modelling Survival Data in Medical Research , 1994 .

[6]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[7]  J O'Quigley,et al.  Estimating average regression effect under non-proportional hazards. , 2000, Biostatistics.

[8]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[9]  P. Sasieni,et al.  Evaluation of long-term survival: use of diagnostics and robust estimators with Cox's proportional hazards model. , 1996, Statistics in medicine.

[10]  John D. Storey A direct approach to false discovery rates , 2002 .

[11]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  M. Abrahamowicz,et al.  Joint estimation of time‐dependent and non‐linear effects of continuous covariates on survival , 2007, Statistics in medicine.

[14]  Jinfeng Xu,et al.  Survival analysis of microarray expression data by transformation models , 2005, Comput. Biol. Chem..

[15]  Mithat Gonen,et al.  Analyzing Receiver Operating Characteristic Curves with SAS , 2007 .

[16]  Schumacher Martin,et al.  Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples , 2008 .

[17]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[18]  Michael Schemper,et al.  Parsimonious analysis of time‐dependent effects in the Cox model , 2007, Statistics in medicine.

[19]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[21]  P. Grambsch,et al.  Modeling Survival Data: Extending the Cox Model , 2000 .

[22]  Edward L. Melnick,et al.  Modeling Survival Data , 2011, International Encyclopedia of Statistical Science.

[23]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[24]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[25]  D.,et al.  Regression Models and Life-Tables , 2022 .

[26]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[27]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[28]  N H Ng'andu,et al.  An empirical comparison of statistical tests for assessing the proportional hazards assumption of Cox's model. , 1997, Statistics in medicine.

[29]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[30]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[31]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[32]  M. Schemper,et al.  The estimation of average hazard ratios by weighted Cox regression , 2009, Statistics in medicine.

[33]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[34]  James F. Watkins,et al.  Analysing Survival Data from Clinical Trials and Observational Studies. , 1995 .

[35]  K R Hess,et al.  Assessing time-by-covariate interactions in proportional hazards regression models using cubic spline functions. , 1994, Statistics in medicine.

[36]  P. Grambsch,et al.  Proportional hazards tests and diagnostics based on weighted residuals , 1994 .

[37]  Terry M. Therneau,et al.  Extending the Cox Model , 1997 .