Deep Learning for Integrated Analysis of Breast Cancer Subtype Specific Multi-omics Data

Breast cancer is a deadly disease which commonly occurs all over the world and has been found to be the largest cause of cancer in females. Its detection is still a major challenge, both from a computational and biological point of views. Next Generation Sequencing (NGS) techniques have accelerated the mapping of human genomes rapidly. Involvement of advanced NGS techniques reveals that multiple genetic molecules are responsible for the cause of breast cancer and its subtypes. However, the high volume of data that is produced by the NGS techniques is difficult to study because of their high dimensionality and complexity. Thus, the integrated study of multi-omics data is one of the major challenges in medical science. This fact motivated us to study the NGS based high throughput expression data of miRNAs and mRNAs as well as Beta values of DNA Methylation of the corresponding mRNAs. In this regard, first, these datasets, together consisting of 33564 features of 305 patients in five classes viz. Luminal A, Luminal B, HER2-enriched, Basal-like and Control, are analysed in an integrated fashion using deep learning technique to classify the breast cancer subtypes properly. Second, the results of the deep learning technique are further analysed in order to identify the deeply connected features, i.e. either miRNA or mRNA or DNA Methylation, which are pivotal in the classification of breast cancer subtypes as well as play a crucial role in its formation. For this purpose, a deep learning technique, called stacked autoencoder is used to encode/transform the features into a low dimensional space, which is then fed to the five well known classifiers for classification. Moreover, the same encoded data is used to select the potential features after performing multiplication with the original data and Bonferroni correction on the p-values produced by the one-sample t-test. The results have been validated quantitatively and through biological significance analysis where oncogene TP53 and tumor suppression gene BRCA1 have been found. These genes are known to play a crucial role in breast cancer. The datasets, code and supplementary materials of this work are provided online at http://www.nitttrkol.ac.in/indrajit/projects/integrated-analysis-breastcancer-subtypes/.

[1]  G. Viale,et al.  The current state of breast cancer classification. , 2012, Annals of oncology : official journal of the European Society for Medical Oncology.

[2]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[3]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Isabella Castiglioni,et al.  Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential , 2015, BMC Systems Biology.

[5]  Jan Budczies,et al.  Aberrant DNA methylation impacts gene expression and prognosis in breast cancer subtypes , 2016, International journal of cancer.

[6]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[7]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[8]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[9]  B. Stewart,et al.  World Cancer Report , 2003 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[12]  Benjamin O. Anderson,et al.  Breast Cancer Issues in Developing Countries: An Overview of the Breast Health Global Initiative , 2008, World Journal of Surgery.

[13]  Ujjwal Maulik,et al.  A new evolutionary microRNA marker selection using next-generation sequencing data , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[14]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[15]  Eduardo Andrés-León,et al.  Novel miRNA-mRNA interactions conserved in essential cancer pathways , 2017, Scientific Reports.

[16]  P. Smyth,et al.  The thyroid and breast cancer: a significant association? , 1997, Annals of medicine.

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[19]  Sampsa Hautaniemi,et al.  DNA methylation signature (SAM40) identifies subgroups of the Luminal A breast cancer samples with distinct survival , 2016, Oncotarget.

[20]  Lianghong Zheng,et al.  DNA methylation markers for diagnosis and prognosis of common cancers , 2017, Proceedings of the National Academy of Sciences.

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Chi-Meng Tzeng,et al.  Integrated Analysis of DNA Methylation and mRNA Expression Profiles to Identify Key Genes in Severe Oligozoospermia , 2017, Front. Physiol..