Biclustering reveals breast cancer tumour subgroups with common clinical features and improves prediction of disease recurrence

BackgroundMany studies have revealed correlations between breast tumour phenotypes, variations in gene expression, and patient survival outcomes. The molecular heterogeneity between breast tumours revealed by these studies has allowed prediction of prognosis and has underpinned stratified therapy, where groups of patients with particular tumour types receive specific treatments. The molecular tests used to predict prognosis and stratify treatment usually utilise fixed sets of genomic biomarkers, with the same biomarker sets being used to test all patients. In this paper we suggest that instead of fixed sets of genomic biomarkers, it may be more effective to use a stratified biomarker approach, where optimal biomarker sets are automatically chosen for particular patient groups, analogous to the choice of optimal treatments for groups of similar patients in stratified therapy. We illustrate the effectiveness of a biclustering approach to select optimal gene sets for determining the prognosis of specific strata of patients, based on potentially overlapping, non-discrete molecular characteristics of tumours.ResultsBiclustering identified tightly co-expressed gene sets in the tumours of restricted subgroups of breast cancer patients. The co-expressed genes in these biclusters were significantly enriched for particular biological annotations and gene regulatory modules associated with breast cancer biology. Tumours identified within the same bicluster were more likely to present with similar clinical features. Bicluster membership combined with clinical information could predict patient prognosis in conditional inference tree and ridge regression class prediction models.ConclusionsThe increasing clinical use of genomic profiling demands identification of more effective methods to segregate patients into prognostic and treatment groups. We have shown that biclustering can be used to select optimal gene sets for determining the prognosis of specific strata of patients.

[1]  H. Dressman,et al.  Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer. , 2008, JAMA.

[2]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[4]  A. Shelling,et al.  Predictive and prognostic molecular markers for cancer medicine , 2010, Therapeutic advances in medical oncology.

[5]  Blaise Hanczar,et al.  Bagging for Biclustering: Application to Microarray Data , 2010, ECML/PKDD.

[6]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[7]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[9]  A. Shelling,et al.  YB-1, the E2F pathway, and regulation of tumor cell growth. , 2012, Journal of the National Cancer Institute.

[10]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[11]  M. West,et al.  Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Shaofeng Liu,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2006 .

[13]  Achuthsankar S. Nair,et al.  Biclustering of gene expression data using reactive greedy randomized adaptive search procedure , 2009, BMC Bioinformatics.

[14]  C. Maxwell Biomarker research in breast cancer. , 2010, Clinical journal of oncology nursing.

[15]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[16]  Richard Bonneau Learning biological networks: from modules to dynamics. , 2008, Nature chemical biology.

[17]  Sheng Zhong,et al.  Privacy-preserving models for comparing survival curves using the logrank test , 2011, Comput. Methods Programs Biomed..

[18]  Lajos Pusztai,et al.  Gene expression profiling of breast cancer , 2009, Breast Cancer Research.

[19]  Matt van de Rijn,et al.  Gene expression profiling of breast cancer. , 2008, Annual review of pathology.

[20]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[21]  A. Tang,et al.  Hierarchical Clustering of Gene Expression Data with Divergence Measure , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[22]  Soonmyung Paik,et al.  Gene expression profiling of breast cancer: a new tumor marker. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[24]  Hiromitsu Araki,et al.  GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis , 2012, FEBS open bio.

[25]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[26]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[28]  Jaeyun Sung,et al.  Relative Expression Analysis for Molecular Cancer Diagnosis and Prognosis , 2010, Technology in cancer research & treatment.

[29]  Ø. Borgan,et al.  Assessment of evaluation criteria for survival prediction from genomic data , 2011, Biometrical journal. Biometrische Zeitschrift.

[30]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Hedi Peterson,et al.  VisHiC—hierarchical functional enrichment analysis of microarray data , 2009, Nucleic Acids Res..

[32]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[33]  D. Rimm,et al.  Classification of Breast Cancer Using Genetic Algorithms and Tissue Microarrays , 2006, Clinical Cancer Research.

[34]  Hege M. Bøvelstad,et al.  Survival prediction from clinico-genomic models - a comparative study , 2009, BMC Bioinformatics.

[35]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[36]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[37]  N. Normanno,et al.  Prognostic Applications of Gene Expression Signatures in Breast Cancer , 2010, Oncology.

[38]  Joana P Gonçalves,et al.  BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data , 2009, BMC Research Notes.

[39]  Therese Sørlie,et al.  Molecular portraits of breast cancer: tumour subtypes as distinct disease entities. , 2004, European journal of cancer.

[40]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[41]  Joshy George,et al.  Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. , 2006, Cancer research.

[42]  A. Maiorana,et al.  Predictive and Prognostic Role of P53 According to Tumor Phenotype in Breast Cancer Patients Treated with Preoperative Chemotherapy: A Single-Institution Analysis , 2010, The International journal of biological markers.

[43]  Francesco Bertoni,et al.  Hierarchical clustering analysis of pathologic and molecular data identifies prognostically and biologically distinct groups of colorectal carcinomas , 2011, Modern Pathology.

[44]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[45]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[46]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[47]  Ali Kashif Bashir,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2013, ICIRA 2013.

[48]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[49]  L. V. van't Veer,et al.  Cross‐validated Cox regression on microarray gene expression data , 2006, Statistics in medicine.

[50]  Sanghyun Park,et al.  Noise-robust algorithm for identifying functionally associated biclusters from gene expression data , 2011, Inf. Sci..