Improvement of cancer subtype prediction by incorporating transcriptome expression data and heterogeneous biological networks

BackgroundIdentification of cancer subtypes is of great importance to facilitate cancer diagnosis and therapy. A number of methods have been proposed to integrate multi-sources data to identify cancer subtypes in recent years. However, few of them consider the regulatory associations between genome features and the contribution weights of different data-views in data integration. It is widely accepted that the regulatory associations between features play important roles in cancer subtype studies. In addition, different data-views may have different contributions in data integration for cancer subtype prediction.ResultsIn this paper, we propose a method, CSPRV, to improve the cancer subtype prediction by incorporating multi-sources transcriptome expression data and heterogeneous biological networks. We extract multiple expression features of each genome element based on the regulatory associations in the heterogeneous biological networks and use a generalized matrix correlation method (RV2) to predict the similarities between samples in each view of expression data. We fuse the similarity information in multiple data-views according to different integration weights. Based on the integrated similarities between samples, we cluster samples into different subtype groups. Comprehensive experiments on TCGA cancer datasets demonstrate that the proposed method can identify more clinically meaningful cancer subtypes comparing with most existing methods.ConclusionsThe consideration of regulatory associations between biological features and data-views contribution is important to improve the understanding of cancer subtypes. The proposed method provides an open framework to incorporate transcriptome expression data and biological regulation network to predict cancer subtypes.

[1]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[2]  Christina Backes,et al.  miEAA: microRNA enrichment analysis and annotation , 2016, Nucleic Acids Res..

[3]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[4]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[5]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of gastric adenocarcinoma , 2014, Nature.

[6]  M. Goel,et al.  Understanding survival analysis: Kaplan-Meier estimate , 2010, International journal of Ayurveda research.

[7]  Y. Xing,et al.  SURVIV for survival analysis of mRNA isoform variation , 2016, Nature Communications.

[8]  Carlos Caldas,et al.  Molecular heterogeneity of breast carcinomas and the cancer stem cell hypothesis , 2007, Nature Reviews Cancer.

[9]  Jiuyong Li,et al.  CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization , 2017, Bioinform..

[10]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[11]  Min Wu,et al.  A two-layer integration framework for protein complex detection , 2016, BMC Bioinformatics.

[12]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[13]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[14]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[15]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[16]  Nectarios Koziris,et al.  TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support , 2011, Nucleic Acids Res..

[17]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[18]  Fátima Sánchez-Cabo,et al.  GOplot: an R package for visually combining expression data with functional analysis , 2015, Bioinform..

[19]  A. Arance,et al.  Clinical implications of the intrinsic molecular subtypes of breast cancer. , 2015, Breast.

[20]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[22]  Shuhui Liu,et al.  Hierarchical Similarity Network Fusion for Discovering Cancer Subtypes , 2018, ISBRA.

[23]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[24]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[25]  Aidong Zhang,et al.  Affinity network fusion and semi-supervised learning for cancer patient clustering. , 2018, Methods.

[26]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[27]  Jiuyong Li,et al.  Identifying Cancer Subtypes from miRNA-TF-mRNA Regulatory Networks and Expression Data , 2016, PloS one.

[28]  Xiang-Sun Zhang,et al.  Breast tumor subgroups reveal diverse clinical prognostic power , 2014, Scientific Reports.

[29]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Yang Guo,et al.  Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer , 2019, Neurocomputing.

[31]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[32]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[33]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[34]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[35]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[36]  Z. Hall Cancer , 1906, The Hospital.

[37]  Ming Lu,et al.  TransmiR: a transcription factor–microRNA regulation database , 2009, Nucleic Acids Res..