Identification of Biomarkers for Prostate Cancer Prognosis Using a Novel Two-Step Cluster Analysis

Prognosis of Prostate cancer is challenging due to incomplete assessment by clinical variables such as Gleason score, metastasis stage, surgical margin status, seminal vesicle invasion status and preoperative prostate-specific antigen level. The whole-genome gene expression assay provides us with opportunities to identify molecular indicators for predicting disease outcomes. However, cell composition heterogeneity of the tissue samples usually generates inconsistent results for cancer profile studies. We developed a two-step strategy to identify prognostic biomarkers for prostate cancer by taking into account the variation due to mixed tissue samples. In the first step, an unsupervised EM clustering analysis was applied to each gene to cluster patient samples into subgroups based on the expression values of the gene. In the second step, genes were selected based on χ2 correlation analysis between the cluster indicators obtained in the first step and the observed clinical outcomes. Two simulation studies showed that the proposed method identified 30% more prognostic genes than the traditional differential expression analysis methods such as SAM and LIMMA. We also analyzed a real prostate cancer expression data set using the new method and the traditional methods. The pathway assay showed that the genes identified with the new method are significantly enriched by prostate cancer relevant pathways such as the wnt signaling pathway and TGF-β signaling pathway. Nevertheless, these genes were not detected by the traditional methods.

[1]  Zhenyu Jia,et al.  In silico estimates of tissue components in surgical samples based on expression profiling data. , 2010, Cancer research.

[2]  D. Albanes,et al.  Serum insulin-like growth factor I: tumor marker or etiologic factor? A prospective study of prostate cancer among Finnish men. , 2003, Cancer research.

[3]  N. Clarke,et al.  Use of classical and novel biomarkers as prognostic risk factors for localised prostate cancer: a systematic review. , 2009, Health technology assessment.

[4]  B. Leyland-Jones,et al.  Prostate cancer genes associated with TMPRSS2–ERG gene fusion and prognostic of biochemical recurrence in multiple cohorts , 2010, British Journal of Cancer.

[5]  Zhenyu Jia,et al.  Diagnosis of prostate cancer using differentially expressed genes in stroma. , 2010, Cancer research.

[6]  Sylvia Richardson,et al.  Statistical Applications in Genetics and Molecular Biology Fully Bayesian Mixture Model for Differential Gene Expression : Simulations and Model Checks , 2011 .

[7]  M. Rubin,et al.  ETS gene fusions in prostate cancer: from discovery to daily clinical practice. , 2009, European urology.

[8]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[9]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[10]  B. Bickers,et al.  New molecular biomarkers for the prognosis and management of prostate cancer--the post PSA era. , 2009, Anticancer research.

[11]  R Garside,et al.  The harmful health effects of recreational ecstasy: a systematic review of observational evidence. , 2009, Health technology assessment.

[12]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[13]  J. Wang-Rodriguez,et al.  Expression signatures that correlated with Gleason score and relapse in prostate cancer. , 2007, Genomics.

[14]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[15]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[16]  J. S. Rao,et al.  Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection , 2003 .

[17]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[18]  P. Kantoff,et al.  Testing a Multigene Signature of Prostate Cancer Death in the Swedish Watchful Waiting Cohort , 2008, Cancer Epidemiology Biomarkers & Prevention.

[19]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .