Bayesian methods for variable selection in survival models with application to DNA microarray data

Selection of significant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This paper considers hierarchical Bayesian gene selection model for survival data. In survival analysis the popular models are usually well suited for data with few covariates and many observations (subjects). In contrast for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when n « p. In our approach, rather than fixing the number of selected genes, we assign a prior distribution to this number. That way it creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.

[1]  D. Cox Regression Models and Life-Tables , 1972 .

[2]  X. Shu,et al.  Genetic polymorphism of cytochrome P450-1B1 and risk of breast cancer. , 2000, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[3]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[4]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[7]  D. Collett Modelling survival data , 1994 .

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Joseph G. Ibrahim,et al.  Bayesian Survival Analysis , 2004 .

[10]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[11]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Kalbfleisch Non‐Parametric Bayesian Analysis of Survival Time Data , 1978 .

[13]  F. Berger,et al.  FVT-1, a novel human transcription unit affected by variant translocation t(2;18)(p11;q21) of follicular lymphoma. , 1993, Blood.