Identification of functional gene modules by integrating multi-omics data and known molecular interactions

Multi-omics data integration has emerged as a promising approach to identify patient subgroups. However, in terms of grouping genes (or gene products) into co-expression modules, data integration methods suffer from two main drawbacks. First, most existing methods only consider genes or samples measured in all different datasets. Second, known molecular interactions (e.g., transcriptional regulatory interactions, protein–protein interactions and biological pathways) cannot be utilized to assist in module detection. Herein, we present a novel data integration framework, Correlation-based Local Approximation of Membership (CLAM), which provides two methodological innovations to address these limitations: 1) constructing a trans-omics neighborhood matrix by integrating multi-omics datasets and known molecular interactions, and 2) using a local approximation procedure to define gene modules from the matrix. Applying Correlation-based Local Approximation of Membership to human colorectal cancer (CRC) and mouse B-cell differentiation multi-omics data obtained from The Cancer Genome Atlas (TCGA), Clinical Proteomics Tumor Analysis Consortium (CPTAC), Gene Expression Omnibus (GEO) and ProteomeXchange database, we demonstrated its superior ability to recover biologically relevant modules and gene ontology (GO) terms. Further investigation of the colorectal cancer modules revealed numerous transcription factors and KEGG pathways that played crucial roles in colorectal cancer progression. Module-based survival analysis constructed four survival-related networks in which pairwise gene correlations were significantly correlated with colorectal cancer patient survival. Overall, the series of evaluations demonstrated the great potential of Correlation-based Local Approximation of Membership for identifying modular biomarkers for complex diseases. We implemented Correlation-based Local Approximation of Membership as a user-friendly application available at https://github.com/free1234hm/CLAM.

[1]  Yufen Xu,et al.  Genome-Wide Association and Transcriptome-Wide Association Studies Identify Novel Susceptibility Genes Contributing to Colorectal Cancer , 2022, Journal of immunology research.

[2]  Ju-Hoon Lee,et al.  Independent Component Analysis Identifies the Modulons Expanding the Transcriptional Regulatory Networks of Enterohemorrhagic Escherichia coli , 2022, Frontiers in Microbiology.

[3]  Jennifer A. Higgins,et al.  Increased mitochondrial proline metabolism sustains proliferation and survival of colorectal cancer cells , 2022, PloS one.

[4]  Yun-ping Zhu,et al.  MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis , 2022, Frontiers in Genetics.

[5]  S. Feo,et al.  Integrated Multi-Omics Investigations of Metalloproteinases in Colon Cancer: Focus on MMP2 and MMP9 , 2021, International journal of molecular sciences.

[6]  M. Tayfur,et al.  Comparison of histopathological findings of the colon adenomas and adenocarcinomas with cyclin D1 and Ki-67 expression. , 2021, Nigerian journal of clinical practice.

[7]  Haiquan Qin,et al.  Upregulation of ADAM12 Is Associated With a Poor Survival and Immune Cell Infiltration in Colon Adenocarcinoma , 2021, Frontiers in Oncology.

[8]  Jiudi Lv,et al.  MicroRNA-148a/152 cluster restrains tumor stem cell phenotype of colon cancer via modulating CCT6A , 2021, Anti-cancer drugs.

[9]  E. Haglind,et al.  Younger age at onset of colorectal cancer is associated with increased patient's delay. , 2021, European journal of cancer.

[10]  Zhaohui Xu,et al.  An Update on the Potential Roles of E2F Family Members in Colorectal Cancer , 2021, Cancer management and research.

[11]  A. Lánczky,et al.  Web-Based Survival Analysis Tool Tailored for Medical Research (KMplot): Development and Implementation , 2021, Journal of medical Internet research.

[12]  Roman Schulte-Sasse,et al.  Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms , 2021, Nature Machine Intelligence.

[13]  An-Yuan Guo,et al.  hTFtarget: A Comprehensive Database for Regulations of Human Transcription Factors and Their Targets , 2019, bioRxiv.

[14]  Tulika Kakati,et al.  Comparison of Methods for Differential Co-expression Analysis for Disease Biomarker Prediction , 2019, Comput. Biol. Medicine.

[15]  Xiaowei Wang,et al.  miRDB: an online database for prediction of functional microRNA targets , 2019, Nucleic Acids Res..

[16]  G. Leone,et al.  The broken cycle: E2F dysfunction in cancer , 2019, Nature Reviews Cancer.

[17]  Roded Sharan,et al.  Simultaneous Integration of Multi-omics Data Improves the Identification of Cancer Driver Modules. , 2019, Cell systems.

[18]  Douglas B. Johnson,et al.  Biological Consequences of MHC-II Expression by Tumor Cells in Cancer , 2018, Clinical Cancer Research.

[19]  J. Lang,et al.  The roles of metallothioneins in carcinogenesis , 2018, Journal of Hematology & Oncology.

[20]  Guoxin Li,et al.  High expression of COL10A1 is associated with poor prognosis in colorectal cancer , 2018, OncoTargets and therapy.

[21]  Huiming Sun,et al.  Cartilage oligomeric matrix protein is a prognostic factor and biomarker of colon cancer and promotes cell proliferation by activating the Akt pathway , 2018, Journal of Cancer Research and Clinical Oncology.

[22]  Y. Saeys,et al.  A comprehensive evaluation of module detection methods for gene expression data , 2018, Nature Communications.

[23]  Niko Beerenwinkel,et al.  Network-based integration of multi-omics data for prioritizing cancer genes , 2018, Bioinform..

[24]  D. McMillan,et al.  NF-κB pathways in the development and progression of colorectal cancer. , 2018, Translational research : the journal of laboratory and clinical medicine.

[25]  Hyojin Kim,et al.  TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions , 2017, Nucleic Acids Res..

[26]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[27]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[28]  Bernhard Kuster,et al.  moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets. , 2016, Journal of proteome research.

[29]  George Michailidis,et al.  A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data , 2015, Bioinform..

[30]  Samantha A. Morris,et al.  CellNet: Network Biology Applied to Stem Cell Engineering , 2014, Cell.

[31]  Tom Michoel,et al.  Integrative Multi-omics Module Network Inference with Lemon-Tree , 2014, PLoS Comput. Biol..

[32]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[33]  Shi-Hua Zhang,et al.  Identification of mutated core cancer modules by integrating somatic mutation, copy number variation, and gene expression data , 2013, BMC Systems Biology.

[34]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[35]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[36]  Z. Yakhini,et al.  Identifying In-Trans Process Associated Genes in Breast Cancer by Integrated Analysis of Copy Number and Expression Data , 2013, PloS one.

[37]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[38]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[39]  C. Sander,et al.  Integrative Subtype Discovery in Glioblastoma Using iCluster , 2012, PloS one.

[40]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[41]  Y. Nie,et al.  Elevated expression of MGb2-Ag/TRAK1 is correlated with poor prognosis in patients with colorectal cancer , 2011, International Journal of Colorectal Disease.

[42]  Sampsa Hautaniemi,et al.  CNAmet: an R package for integrating copy number, methylation and expression data , 2011, Bioinform..

[43]  D. Pe’er,et al.  An Integrated Approach to Uncover Drivers of Cancer , 2010, Cell.

[44]  M. Kloor,et al.  Lack of HLA class II antigen expression in microsatellite unstable colorectal carcinomas is caused by mutations in HLA class II regulatory genes , 2010, International journal of cancer.

[45]  J. Casal,et al.  Differential protein expression on the cell surface of colorectal cancer cells associated to tumor metastasis , 2010, Proteomics.

[46]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[47]  S. Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[48]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[49]  S. Horvath,et al.  Evidence for anti-Burkitt tumour globulins in Burkitt tumour patients and healthy individuals. , 1967, British Journal of Cancer.

[50]  Justin Schwartz Engineering , 1929, Nature.

[51]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[52]  Thy-Hou Lin,et al.  Implementing the Fisher's Discriminant Ratio in a k-Means Clustering Algorithm for Feature Selection and Data Set Trimming , 2004, Journal of Chemical Information and Modeling.

[53]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..