CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data

MOTIVATION DNA microarrays allow the simultaneous measurement of thousands of gene expression levels in any given patient sample. Gene expression data have been shown to correlate with survival in several cancers, however, analysis of the data is difficult, since typically at most a few hundred patients are available, resulting in severely underdetermined regression or classification models. Several approaches exist to classify patients in different risk classes, however, relatively little has been done with respect to the prediction of actual survival times. We introduce CASPAR, a novel method to predict true survival times for the individual patient based on microarray measurements. CASPAR is based on a multivariate Cox regression model that is embedded in a Bayesian framework. A hierarchical prior distribution on the regression parameters is specifically designed to deal with high dimensionality (large number of genes) and low sample size settings, that are typical for microarray measurements. This enables CASPAR to automatically select small, most informative subsets of genes for prediction. RESULTS Validity of the method is demonstrated on two publicly available datasets on diffuse large B-cell lymphoma (DLBCL) and on adenocarcinoma of the lung. The method successfully identifies long and short survivors, with high sensitivity and specificity. We compare our method with two alternative methods from the literature, demonstrating superior results of our approach. In addition, we show that CASPAR can further refine predictions made using clinical scoring systems such as the International Prognostic Index (IPI) for DLBCL and clinical staging for lung cancer, thus providing an additional tool for the clinician. An analysis of the genes identified confirms previously published results, and furthermore, new candidate genes correlated with survival are identified.

[1]  J Hermans,et al.  CD44 expression predicts disease outcome in localized large B cell lymphoma , 1999, Leukemia.

[2]  Li Song,et al.  Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect , 2003, BMC Bioinformatics.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Ash A. Alizadeh,et al.  Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. , 2004, The New England journal of medicine.

[5]  J Diebold,et al.  Prognostic significance of survivin expression in diffuse large B-cell lymphomas. , 2000, Blood.

[6]  D C Linch,et al.  Prognostic significance of BCL-2 expression and bcl-2 major breakpoint region rearrangement in diffuse large cell non-Hodgkin's lymphoma: a British National Lymphoma Investigation Study. , 1996, Blood.

[7]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[8]  L. Staudt,et al.  BCL-6 represses genes that function in lymphocyte differentiation, inflammation, and cell cycle control. , 2000, Immunity.

[9]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[10]  Y. Fukushima,et al.  EEC syndrome type 3 with a heterozygous germline mutation in the P63 gene and B cell lymphoma , 2003, American journal of medical genetics. Part A.

[11]  K. Ohshima,et al.  Prognostic clinicopathologic factors, including immunologic expression in diffuse large B‐cell lymphomas , 1999, Pathology international.

[12]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[13]  S. Barrans,et al.  Germinal center phenotype and bcl-2 expression combined with the International Prognostic Index improves patient risk stratification in diffuse large B-cell lymphoma. , 2002, Blood.

[14]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Robert Tibshirani,et al.  HGAL is a novel interleukin-4-inducible gene that strongly predicts survival in diffuse large B-cell lymphoma. , 2003, Blood.

[16]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[17]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[18]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[19]  J C Reed,et al.  Prognostic significance of Bcl-2 protein expression and Bcl-2 gene rearrangement in diffuse aggressive non-Hodgkin's lymphoma. , 1997, Blood.

[20]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[21]  Manuela Gariboldi,et al.  Limits of predictive models using microarray data for breast cancer clinical treatment outcome. , 2005, Journal of the National Cancer Institute.

[22]  D.,et al.  Regression Models and Life-Tables , 2022 .

[23]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[24]  H. Saito,et al.  Mutations of the p53 gene as a prognostic factor in aggressive B-cell lymphoma. , 1997, The New England journal of medicine.

[25]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[26]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[27]  P. Koduru,et al.  Correlation between mutation in P53, p53 expression, cytogenetics, histologic type, and survival in patients with B-cell non-Hodgkin's lymphoma. , 1997, Blood.

[28]  Hong Wang,et al.  Protein profiles associated with survival in lung adenocarcinoma , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  R Tibshirani,et al.  Expression of a single gene, BCL-6, strongly predicts survival in patients with diffuse large B-cell lymphoma. , 2001, Blood.

[30]  R. W. Davis,et al.  Global analysis of ATM polymorphism reveals significant functional constraint. , 2001, American journal of human genetics.

[31]  Lingyu Chen,et al.  Exploring Hybrid Monte Carlo in Bayesian Computation , 2000 .

[32]  CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data , 2007, Bioinform..

[33]  R. Tsang,et al.  Hodgkin’s lymphoma , 2018, Concise Notes in Oncology for MRCP and MRCS.

[34]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[35]  D. Louis,et al.  Frequent disruption of the RB1 pathway in diffuse large B cell lymphoma: prognostic significance of E2F-1 and p16INK4A , 2000, Leukemia.

[36]  J Diebold,et al.  Prognostic significance of bcl-2 protein expression in aggressive non-Hodgkin's lymphoma. Groupe d'Etude des Lymphomes de l'Adulte (GELA). , 1996, Blood.

[37]  E. Jaffe,et al.  Transcription factor B-cell-specific activator protein (BSAP) is differentially expressed in B cells and in subsets of B-cell lymphomas. , 1998, Blood.

[38]  Ruth Etzioni,et al.  Early detection: The case for early detection , 2003, Nature Reviews Cancer.

[39]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[40]  J Hermans,et al.  Clinical significance of bcl2 and p53 protein expression in diffuse large B-cell lymphoma: a population-based study. , 1996, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[41]  Randy D. Gascoyne Incorporating pathology/biology data into prognostic models in diffuse large B-cell lymphoma , 2002 .

[42]  A. Yoshimura,et al.  The B cell‐specific major raft protein, Raftlin, is necessary for the integrity of lipid raft and BCR signal transduction , 2003, The EMBO journal.

[43]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.