Robust Prediction of t‐Year Survival with Data from Multiple Studies

Recently meta-analysis has been widely utilized to combine information across multiple studies to evaluate a common effect. Integrating data from similar studies is particularly useful in genomic studies where the individual study sample sizes are not large relative to the number of parameters of interest. In this article, we are interested in developing robust prognostic rules for the prediction of t-year survival based on multiple studies. We propose to construct a composite score for prediction by fitting a stratified semiparametric transformation model that allows the studies to have related but not identical outcomes. To evaluate the accuracy of the resulting score, we provide point and interval estimators for the commonly used accuracy measures including the time-specific receiver operating characteristic curves, and positive and negative predictive values. We apply the proposed procedures to develop prognostic rules for the 5-year survival of breast cancer patients based on five breast cancer genomic studies.

[1]  M. Schumacher,et al.  A Comparison of Nonparametric Error Rate Estimation Methods in Classification Problems , 2004 .

[2]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[3]  Diederick E Grobbee,et al.  A systematic review of analytical methods used to study subgroups in (individual patient data) meta-analyses. , 2007, Journal of clinical epidemiology.

[4]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ziding Feng,et al.  Evaluating the Predictiveness of a Continuous Marker , 2007, Biometrics.

[6]  Donglin Zeng,et al.  Maximum likelihood estimation in semiparametric regression models with censored data , 2007, Statistica Sinica.

[7]  Donald Geman,et al.  Large-scale integration of cancer microarray data identifies a robust common cancer signature , 2007, BMC Bioinformatics.

[8]  Donglin Zeng,et al.  Efficient estimation of semiparametric transformation models for counting processes , 2006 .

[9]  Margaret Sullivan Pepe,et al.  The sensitivity and specificity of markers for event times. , 2005, Biostatistics.

[10]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[11]  J. Robins,et al.  Semiparametric regression estimation in the presence of dependent censoring , 1995 .

[12]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[13]  M. Akritas Nearest Neighbor Estimation of a Bivariate Distribution Under Random Censoring , 1994 .

[14]  M. Parmar,et al.  Meta-analysis of the literature or of individual patient data: is there a difference? , 1993, The Lancet.

[15]  Joshy George,et al.  Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. , 2006, Cancer research.

[16]  A. V. D. Vaart,et al.  On Differentiable Functionals , 1991 .

[17]  Anastasios A. Tsiatis,et al.  Semiparametric Efficient Estimation in the Generalized Odds-Rate Class of Regression Models for Right-Censored Time-to-Event Data , 1998, Lifetime data analysis.

[18]  Margaret Sullivan Pepe,et al.  Combining Several Screening Tests: Optimality of the Risk Score , 2002, Biometrics.

[19]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[20]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[21]  Patrick Royston,et al.  Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer , 2004, Statistics in medicine.

[22]  J Hilden Prevalence-free utility-respecting summary indices of diagnostic power do not exist. , 2000, Statistics in medicine.

[23]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[24]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[25]  Nils Lid Hjort,et al.  On inference in parametric survival data models , 1992 .

[26]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[27]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[28]  Tianxi Cai,et al.  Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models , 2007 .

[29]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[30]  Thomas H. Scheike A flexible semiparametric transformation model for survival data , 2006, Lifetime data analysis.

[31]  Z. Szallasi,et al.  A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers , 2006, Nature Genetics.

[32]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[33]  S. Eguchi,et al.  A paradox concerning nuisance parameters and projected estimating functions , 2004 .

[34]  Susan A. Murphy,et al.  Maximum Likelihood Estimation in the Proportional Odds Model , 1997 .

[35]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[36]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[37]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[38]  Tianxi Cai,et al.  Time-Dependent Predictive Values of Prognostic Biomarkers With Failure Time Outcome , 2008, Journal of the American Statistical Association.

[39]  George Davey Smith,et al.  Meta-analysis: Principles and procedures , 1997, BMJ.

[40]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Ajay N. Jain,et al.  Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. , 2006, Cancer cell.

[42]  Susan A. Murphy,et al.  MLE in the proportional odds model , 1996 .

[43]  L. Holmberg,et al.  Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts , 2005, Breast Cancer Research.

[44]  Richard Simon,et al.  A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification , 2007, Statistics in medicine.

[45]  Thomas A Gerds,et al.  Efron‐Type Measures of Prediction Error for Survival Analysis , 2007, Biometrics.

[46]  Zhiliang Ying,et al.  Towards a general asymptotic theory for Cox model with staggered entry , 1997 .