Identification of biomarker‐by‐treatment interactions in randomized clinical trials with survival outcomes and high‐dimensional spaces

Stratified medicine seeks to identify biomarkers or parsimonious gene signatures distinguishing patients that will benefit most from a targeted treatment. We evaluated 12 approaches in high‐dimensional Cox models in randomized clinical trials: penalization of the biomarker main effects and biomarker‐by‐treatment interactions (full‐lasso, three kinds of adaptive lasso, ridge+lasso and group‐lasso); dimensionality reduction of the main effect matrix via linear combinations (PCA+lasso (where PCA is principal components analysis) or PLS+lasso (where PLS is partial least squares)); penalization of modified covariates or of the arm‐specific biomarker effects (two‐I model); gradient boosting; and univariate approach with control of multiple testing. We compared these methods via simulations, evaluating their selection abilities in null and alternative scenarios. We varied the number of biomarkers, of nonnull main effects and true biomarker‐by‐treatment interactions. We also proposed a novel measure evaluating the interaction strength of the developed gene signatures. In the null scenarios, the group‐lasso, two‐I model, and gradient boosting performed poorly in the presence of nonnull main effects, and performed well in alternative scenarios with also high interaction strength. The adaptive lasso with grouped weights was too conservative. The modified covariates, PCA+lasso, PLS+lasso, and ridge+lasso performed moderately. The full‐lasso and adaptive lassos performed well, with the exception of the full‐lasso in the presence of only nonnull main effects. The univariate approach performed poorly in alternative scenarios. We also illustrate the methods using gene expression data from 614 breast cancer patients treated with adjuvant chemotherapy.

[1]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[4]  Trevor Hastie,et al.  Model Assessment and Selection , 2009 .

[5]  Yuan Qi,et al.  Multifactorial approach to predicting resistance to anthracyclines. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  L. Tanoue,et al.  Gefitinib or Carboplatin–Paclitaxel in Pulmonary Adenocarcinoma , 2010 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[9]  D. Louis,et al.  Influence of unrecognized molecular heterogeneity on randomized clinical trials. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[11]  Stefan Michiels,et al.  Multiple testing of treatment‐effect‐modifying biomarkers in a randomized clinical trial with a survival endpoint , 2011, Statistics in medicine.

[12]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[13]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[14]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[15]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[16]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[17]  L. Esserman,et al.  A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. , 2011, JAMA.

[18]  C Hill,et al.  Interpretation of microarray data in cancer , 2007, British Journal of Cancer.

[19]  P. Royston,et al.  Interactions between treatment and continuous covariates: a step toward individualizing therapy. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  M. Pencina,et al.  On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , 2011, Statistics in medicine.

[21]  D. Cox Regression Models and Life-Tables , 1972 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Yan Chen,et al.  Accuracy of GE digital breast tomosynthesis versus supplementary mammographic views for diagnosis of screen-detected soft tissue breast lesions , 2015, Breast Cancer Research.

[24]  Daniel J Sargent,et al.  Integrating biomarkers in clinical trials , 2011, Expert review of molecular diagnostics.

[25]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[26]  Bart Spiessens,et al.  Predictive gene signature in MAGE-A3 antigen-specific cancer immunotherapy. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[27]  F.A.M. Bordonaba,et al.  Wild-Type KRAS Is Required for Panitumumab Efficacy in Patients With Metastatic Colorectal Cancer , 2009 .

[28]  Richard D Riley,et al.  Prognosis research strategy (PROGRESS) 4: Stratified medicine research , 2013, BMJ : British Medical Journal.

[29]  Jong-Hyeon Jeong,et al.  Predicting degree of benefit from adjuvant trastuzumab in NSABP trial B-31. , 2013, Journal of the National Cancer Institute.

[30]  M. Buyse,et al.  Omics-based clinical trial designs , 2013, Current opinion in oncology.

[31]  B. Mellado,et al.  Identification of Docetaxel Resistance Genes in Castration-Resistant Prostate Cancer , 2011, Molecular Cancer Therapeutics.

[32]  P. Rothwell Subgroup analysis in randomised controlled trials: importance, indications, and interpretation , 2005, The Lancet.

[33]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[34]  L. Esserman,et al.  Serial expression analysis of breast tumors during neoadjuvant chemotherapy reveals changes in cell cycle and immune pathways associated with recurrence and response , 2015, Breast Cancer Research.

[35]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[36]  Federico Rotolo,et al.  Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models , 2016, Statistics in medicine.

[37]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[38]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[39]  Krishna R. Kalari,et al.  Genomic analysis reveals that immune function genes are strongly linked to clinical outcome in the North Central Cancer Treatment Group n9831 Adjuvant Trastuzumab Trial. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[40]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[41]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[42]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[43]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[44]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.

[45]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[46]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[47]  M. Schemper Non-parametric analysis of treatment-covariate interaction in the presence of censoring. , 1988, Statistics in medicine.