Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data

BackgroundAn increasing number of studies have profiled tumor specimens using distinct microarray platforms and analysis techniques. With the accumulating amount of microarray data, one of the most intriguing yet challenging tasks is to develop robust statistical models to integrate the findings.ResultsBy applying a two-stage Bayesian mixture modeling strategy, we were able to assimilate and analyze four independent microarray studies to derive an inter-study validated "meta-signature" associated with breast cancer prognosis. Combining multiple studies (n = 305 samples) on a common probability scale, we developed a 90-gene meta-signature, which strongly associated with survival in breast cancer patients. Given the set of independent studies using different microarray platforms which included spotted cDNAs, Affymetrix GeneChip, and inkjet oligonucleotides, the individually identified classifiers yielded gene sets predictive of survival in each study cohort. The study-specific gene signatures, however, had minimal overlap with each other, and performed poorly in pairwise cross-validation. The meta-signature, on the other hand, accommodated such heterogeneity and achieved comparable or better prognostic performance when compared with the individual signatures. Further by comparing to a global standardization method, the mixture model based data transformation demonstrated superior properties for data integration and provided solid basis for building classifiers at the second stage. Functional annotation revealed that genes involved in cell cycle and signal transduction activities were over-represented in the meta-signature.ConclusionThe mixture modeling approach unifies disparate gene expression data on a common probability scale allowing for robust, inter-study validated prognostic signatures to be obtained. With the emerging utility of microarrays for cancer prognosis, it will be important to establish paradigms to meta-analyze disparate gene expression data for prognostic signatures of potential clinical use.

[1]  H. Gallager,et al.  Carcinoma of the breast. Analysis of total lymph node involvement versus level of metastasis , 1977, Cancer.

[2]  P. Chikhlikar,et al.  Node negative breast carcinoma: Hyperprolactinemia and/or overexpression of p53 as an independent predictor of poor prognosis compared to newer and established prognosticators , 1996, Journal of surgical oncology.

[3]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[4]  Adrian Wiestner,et al.  A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  C. Carter,et al.  Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases , 1989, Cancer.

[6]  M. Héry,et al.  Natural history of node-negative breast cancer: are conventional prognostic factors predictors of time to relapse? , 2002, Breast.

[7]  L. O’Driscoll,et al.  Lack of prognostic significance of survivin, survivin-deltaEx3, survivin-2B, galectin-3, bag-1, bax-alpha and MRP-1 mRNAs in breast cancer. , 2003, Cancer letters.

[8]  B. Fisher,et al.  Number of lymph nodes examined and the prognosis of breast carcinoma. , 1970, Surgery, gynecology & obstetrics.

[9]  Javed Khan,et al.  Diagnostic Classification of Cancer Using DNA Microarrays and Artificial Intelligence , 2004, Annals of the New York Academy of Sciences.

[10]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[11]  G. Borsani,et al.  int‐2 Oncogene amplification and prognosis in node‐negative breast carcinoma , 1997, International journal of cancer.

[12]  Richard M. Simon,et al.  A Paradigm for Class Prediction Using Gene Expression Profiles , 2003, J. Comput. Biol..

[13]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[14]  L. O’Driscoll,et al.  Lack of prognostic significance of survivin, survivin-deltaEx3, survivin-2B, galectin-3, bag-1, bax-alpha and MRP-1 mRNAs in breast cancer. , 2003, Cancer letters.

[15]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Debashis Ghosh,et al.  EZH2 is a marker of aggressive breast cancer and promotes neoplastic transformation of breast epithelial cells , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Isabelle Bedrosian,et al.  Cyclin E and survival in patients with breast cancer. , 2002, The New England journal of medicine.

[18]  Hans Lehrach,et al.  A comparison of oligonucleotide and cDNA-based microarray systems. , 2004, Physiological genomics.

[19]  John D. Storey A direct approach to false discovery rates , 2002 .

[20]  J. Nesland,et al.  The prognostic value of p53 and c‐erb b‐2 immunostaining is overrated for patients with lymph node negative breast carcinoma , 2000, Cancer.

[21]  J. Tukey Tightening the clinical trial. , 1993, Controlled clinical trials.

[22]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Giovanni Parmigiani,et al.  Statistical modeling and visualization of molecular profiles in cancer. , 2003, BioTechniques.

[26]  S Hellman,et al.  Natural history of node-negative breast cancer: a study of 826 patients with long-term follow-up. , 1995, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[27]  Donald E. Henson,et al.  Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases , 1989 .

[28]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[29]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[30]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[31]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[32]  L. Stitt,et al.  The predictive power of semiquantitative immunohistochemical assessment of p53 and c-erb B-2 in lymph node-negative breast cancer. , 1996, Human pathology.

[33]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Kevin R. Coombes,et al.  Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies , 2004, Bioinform..