Module-Based Outcome Prediction Using Breast Cancer Compendia

Background The availability of large collections of microarray datasets (compendia), or knowledge about grouping of genes into pathways (gene sets), is typically not exploited when training predictors of disease outcome. These can be useful since a compendium increases the number of samples, while gene sets reduce the size of the feature space. This should be favorable from a machine learning perspective and result in more robust predictors. Methodology We extracted modules of regulated genes from gene sets, and compendia. Through supervised analysis, we constructed predictors which employ modules predictive of breast cancer outcome. To validate these predictors we applied them to independent data, from the same institution (intra-dataset), and other institutions (inter-dataset). Conclusions We show that modules derived from single breast cancer datasets achieve better performance on the validation data compared to gene-based predictors. We also show that there is a trend in compendium specificity and predictive performance: modules derived from a single breast cancer dataset, and a breast cancer specific compendium perform better compared to those derived from a human cancer compendium. Additionally, the module-based predictor provides a much richer insight into the underlying biology. Frequently selected gene sets are associated with processes such as cell cycle, E2F regulation, DNA damage response, proteasome and glycolysis. We analyzed two modules related to cell cycle, and the OCT1 transcription factor, respectively. On an individual basis, these modules provide a significant separation in survival subgroups on the training and independent validation data.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[3]  B. Stillman,et al.  Cold Spring Harbor Laboratory , 1995, Molecular medicine.

[4]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  P. Goss,et al.  Examination of POU homeobox gene expression in human breast cancer cells , 1999, International journal of cancer.

[6]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[7]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[8]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[10]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[11]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[13]  Peter J Park,et al.  Improving identification of differentially expressed genes in microarray studies using information from public databases , 2004, Genome Biology.

[14]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[15]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[16]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.

[17]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[18]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[19]  I. Ernberg,et al.  DNA-dependent conversion of Oct-1 and Oct-2 into transcriptional repressors by Groucho/TLE , 2005, Nucleic acids research.

[20]  Cor J. Veenman,et al.  A protocol for building and evaluating predictors of disease state based on microarray data , 2005, Bioinform..

[21]  Ron Shamir,et al.  Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium , 2005, Molecular systems biology.

[22]  I. Ellis,et al.  A consensus prognostic gene expression classifier for ER positive breast cancer , 2006, Genome Biology.

[23]  Sébastien Bonnet,et al.  A mitochondria-K+ channel axis is suppressed in cancer and its normalization promotes apoptosis and inhibits cancer growth. , 2007, Cancer cell.