Gene expression profiling for prognosis using Cox regression

Given the promise of rich biological information in microarray data we will expect an increasing demand for a robust, practical and well-tested methodology to provide patient prognosis based on gene expression data. In standard settings, with few clinical predictors, such a methodology has been provided by the Cox proportional hazard model, but no corresponding methodology is available to deal with the full set of genes in microarray data. Furthermore, we want the procedure to be able to deal with the general survival data that include censored information. Conceptually such a procedure can be constructed quite easily, but its implementation will never be straightforward due to computational problems. We have developed an approach that relies on an extension of the Cox proportional likelihood that allows random effects parameters. In this approach, we use the full set of genes in the analysis and deal with survival data in the most general way. We describe the development of the model and the steps in the implementation, including a fast computational formula based on a subsampling of the risk set and the singular value decomposition. Finally, we illustrate the methodology using a data set obtained from a cohort of breast cancer patients.

[1]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  James Lindsey,et al.  Fitting Parametric Counting Processes by using Log-linear Models , 1995 .

[3]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Youngjo Lee,et al.  Hierarchical likelihood approach for frailty models , 2001 .

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[7]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[8]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[9]  J. Palmgren,et al.  Estimation of Multivariate Frailty Models Using Penalized Partial Likelihood , 2000, Biometrics.

[10]  R. Tibshirani,et al.  Supervised harvesting of expression trees , 2001, Genome Biology.

[11]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[12]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[13]  D. Clayton,et al.  Statistical Models in Epidemiology , 1993 .

[14]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[16]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.