Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data

Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.

[1]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[2]  Byoung-Tak Zhang,et al.  Integrated analysis of genome-wide DNA methylation and gene expression profiles in molecular subtypes of breast cancer , 2013, Nucleic acids research.

[3]  Miron B. Kursa,et al.  Robustness of Random Forest-based gene selection methods , 2013, BMC Bioinformatics.

[4]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[5]  Y. Inokawa,et al.  Dynamin 3: a new candidate tumor suppressor gene in hepatocellular carcinoma detected by triple combination array analysis , 2013, OncoTargets and therapy.

[6]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[7]  L. Bernstein,et al.  Receptors, associations, and risk factor differences by breast cancer subtypes: positive or negative? , 2011, Journal of the National Cancer Institute.

[8]  A E Giuliano,et al.  FOXC1 regulates the functions of human basal-like breast cancer cells by activating NF-κB signaling , 2012, Oncogene.

[9]  Johan Staaf,et al.  Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns , 2010, Breast Cancer Research.

[10]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[11]  Gurkan Bebek,et al.  FOXA1 Represses the Molecular Phenotype of Basal Breast Cancer Cells , 2012, Oncogene.

[12]  P. Zhang,et al.  Sorcin, a potential therapeutic target for reversing multidrug resistance in cancer , 2012, Journal of Physiology and Biochemistry.

[13]  A. Nobel,et al.  Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data , 2008 .

[14]  F. May,et al.  TFF3 is a normal breast epithelial protein and is associated with differentiated phenotype in early breast cancer but predisposes to invasion and metastasis in advanced disease. , 2012, The American journal of pathology.

[15]  Francesca D. Ciccarelli,et al.  Network of Cancer Genes (NCG 3.0): integration and analysis of genetic and network properties of cancer genes , 2011, Nucleic Acids Res..

[16]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[17]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[18]  E. Liu,et al.  Trefoil factor 3 is oncogenic and mediates anti-estrogen resistance in human mammary carcinoma. , 2010, Neoplasia.

[19]  S. Duffy,et al.  Critical research gaps and translational priorities for the successful prevention and treatment of breast cancer , 2013, Breast Cancer Research.

[20]  Igor Goryanin,et al.  Journal of Integrative Bioinformatics , 2015 .

[21]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[23]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Song Liu,et al.  Trefoil factor family 3 (TFF3) expression and its interaction with estrogen receptor (ER) in endometrial adenocarcinoma. , 2013, Gynecologic oncology.

[26]  S. Baylin,et al.  DNA methylation and gene silencing in cancer , 2005, Nature Clinical Practice Oncology.

[27]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[28]  Jill S. Barnholtz-Sloan,et al.  Splitting random forest (SRF) for determining compact sets of genes that distinguish between cancer subtypes , 2012, Journal of Clinical Bioinformatics.

[29]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[31]  Y. Koga,et al.  Loss of trefoil factor 1 is regulated by DNA methylation and is an independent predictive factor for poor survival in advanced gastric cancer. , 2013, International journal of oncology.

[32]  Anla Hu,et al.  Effects of Rab27a on proliferation, invasion, and anti-apoptosis in human glioma cell , 2013, Tumor Biology.