Breast cancer patient stratification using a molecular regularized consensus clustering method.

Breast cancers are highly heterogeneous with different subtypes that lead to different clinical outcomes including prognosis, response to treatment and chances of recurrence and metastasis. An important task in personalized medicine is to determine the subtype for a breast cancer patient in order to provide the most effective treatment. In order to achieve this goal, integrative genomics approach has been developed recently with multiple modalities of large datasets ranging from genotypes to multiple levels of phenotypes. A major challenge in integrative genomics is how to effectively integrate multiple modalities of data to stratify the breast cancer patients. Consensus clustering algorithms have often been adopted for this purpose. However, existing consensus clustering algorithms are not suitable for the situation of integrating clustering results obtained from a mixture of numerical data and categorical data. In this work, we present a mathematical formulation for integrative clustering of multiple-source data including both numerical and categorical data to resolve the above issue. Specifically, we formulate the problem as a novel consensus clustering method called Molecular Regularized Consensus Patient Stratification (MRCPS) based on an optimization process with regularization. Unlike the traditional consensus clustering methods, MRCPS can automatically and spontaneously cluster both numerical and categorical data with any option of similarity metrics. We apply this new method by applying it on the TCGA breast cancer datasets and evaluate using both statistical criteria and clinical relevance on predicting prognosis. The result demonstrates the superiority of this method in terms of effectiveness of aggregation and differentiating patient outcomes. Our method, while motivated by the breast cancer research, is nevertheless universal for integrative genomics studies.

[1]  V. Theodorou,et al.  GATA3 acts upstream of FOXA1 in mediating ESR1 binding by shaping enhancer accessibility , 2013, Genome research.

[2]  Brian S. Yandell,et al.  Practical Data Analysis for Designed Experiments , 1998 .

[3]  Y Shimoyama,et al.  Expression of E- and P-cadherin in gastric carcinomas. , 1991, Cancer research.

[4]  Stefano Volinia,et al.  Prognostic microRNA/mRNA signature from the integrated analysis of patients with invasive breast cancer , 2013, Proceedings of the National Academy of Sciences.

[5]  C. Sander,et al.  Integrative Subtype Discovery in Glioblastoma Using iCluster , 2012, PloS one.

[6]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[7]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming , 2007, Neural Computation.

[8]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[9]  J. Espada,et al.  Anomalous expression of P-cadherin in breast carcinoma. Correlation with E-cadherin expression and pathological features. , 1995, The American journal of pathology.

[10]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[11]  Tim Beißbarth,et al.  Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer , 2011, BMC Bioinformatics.

[12]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[13]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[14]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[15]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[16]  J. Jankowski,et al.  Cadherin switching dictates the biology of transitional cell carcinoma of the bladder: ex vivo and in vitro studies , 2008, The Journal of pathology.

[17]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2011, Stat. Anal. Data Min..

[18]  F. Markowetz,et al.  Quantitative Image Analysis of Cellular Heterogeneity in Breast Tumors Complements Genomic Profiling , 2012, Science Translational Medicine.

[19]  Janusz Jankowski,et al.  Sequential changes in cadherin‐catenin expression associated with the progression and heterogeneity of primary oesophageal squamous carcinoma , 1998 .

[20]  Yusuke Nakamura,et al.  Genome-wide cDNA microarray analysis of gene expression profiles in pancreatic cancers using populations of tumor cells and normal ductal epithelial cells selected for purity by laser microdissection , 2004, Oncogene.

[21]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[22]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.