iOmicsPASS: a novel method for integration of multi-omics data over biological networks and discovery of predictive subnetworks

We developed iOmicsPASS, an intuitive method for network-based multi-omics data integration and detection of biological subnetworks for phenotype prediction. The method converts abundance measurements into co-expression scores of biological networks and uses a powerful phenotype prediction method adapted for network-wise analysis. Simulation studies show that the proposed data integration approach considerably improves the quality of predictions. We illustrate iOmicsPASS through the integration of quantitative multi-omics data using transcription factor regulatory network and protein-protein interaction network for cancer subtype prediction. Our analysis of breast cancer data identifies network signatures surrounding established markers of molecular subtypes. The analysis of colorectal cancer data highlights a protein interactome surrounding key proto-oncogenes as predictive features of subtypes, rendering them more biologically interpretable than the approaches integrating data without a priori relational information. However, the results indicate that current molecular subtyping is overly dependent on transcriptomic data and crude integrative analysis fails to account for molecular heterogeneity in other -omics data. The analysis also suggest that tumor subtypes are not mutually exclusive and future subtyping should therefore consider multiplicity in assignments. Availability: https://github.com/cssblab/iOmicsPASS

[1]  Qing Yang,et al.  ITFP: an integrated platform of mammalian transcription factors , 2008, Bioinform..

[2]  S Takeno,et al.  Involvement of the intestinal microflora in nitrazepam-induced teratogenicity in rats and its relationship to nitroreduction. , 1991, Teratology.

[3]  Matthias Meyer,et al.  From micrograms to picograms: quantitative PCR reduces the material demands of high-throughput sequencing , 2007, Nucleic acids research.

[4]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[5]  Rok Blagus,et al.  Class prediction for high-dimensional class-imbalanced data , 2010, BMC Bioinformatics.

[6]  Tom Michoel,et al.  Integrative Multi-omics Module Network Inference with Lemon-Tree , 2014, PLoS Comput. Biol..

[7]  Rafael A. Irizarry,et al.  Improved microarray methods for profiling the yeast knockout strain collection , 2005, Nucleic acids research.

[8]  Cédric Leroy,et al.  Quantitative phosphoproteomics reveals a cluster of tyrosine kinases that mediates SRC invasive activity in advanced colon carcinoma cells. , 2009, Cancer research.

[9]  Elena B. Pasquale,et al.  Eph receptors and ephrins in cancer: bidirectional signalling and beyond , 2010, Nature Reviews Cancer.

[10]  Lihua Liu,et al.  TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies , 2004, Nucleic Acids Res..

[11]  S Michiels,et al.  Molecular subclasses of breast cancer: how do we define them? The IMPAKT 2012 Working Group Statement. , 2012, Annals of oncology : official journal of the European Society for Medical Oncology.

[12]  Richard W Tothill,et al.  Next-generation sequencing for cancer diagnostics: a practical perspective. , 2011, The Clinical biochemist. Reviews.

[13]  Gangning Liang,et al.  Gene body methylation can alter gene expression and is a therapeutic target in cancer. , 2014, Cancer cell.

[14]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[15]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[17]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[18]  Subha Madhavan,et al.  The CPTAC Data Portal: A Resource for Cancer Proteomics Research. , 2015, Journal of proteome research.

[19]  J. Shay,et al.  Comparison of DNA Quantification Methods for Next Generation Sequencing , 2016, Scientific Reports.

[20]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[21]  Pier Paolo Di Fiore,et al.  Direct Association between the Ret Receptor Tyrosine Kinase and the Src Homology 2-containing Adapter Protein Grb7 (*) , 1996, The Journal of Biological Chemistry.

[22]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[23]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[24]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[25]  J. Xu,et al.  Ribosomal proteins and colorectal cancer. , 2007, Current genomics.

[26]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[27]  Harkamal Walia,et al.  Protein abundances are more conserved than mRNA abundances across diverse taxa , 2010, Proteomics.

[28]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[29]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[30]  I. Tagkopoulos,et al.  Data integration and predictive modeling methods for multi-omics datasets. , 2018, Molecular omics.

[31]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[32]  Holger Schwender,et al.  Imputing Missing Genotypes with Weighted k Nearest Neighbors , 2012, Journal of toxicology and environmental health. Part A.

[33]  Marylyn D. Ritchie,et al.  ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network , 2013, BioData Mining.

[34]  Hans Clevers,et al.  Consensus molecular subtypes of colorectal cancer are recapitulated in in vitro and in vivo models , 2017, Cell Death & Differentiation.

[35]  Berthold Streubel,et al.  Expression of HER2 and the Coamplified Genes GRB7 and MLN64 in Human Breast Cancer: Quantitative Real-time Reverse Transcription-PCR as a Diagnostic Alternative to Immunohistochemistry and Fluorescence In situ Hybridization , 2005, Clinical Cancer Research.

[36]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[37]  Vince D. Calhoun,et al.  Group sparse canonical correlation analysis for genomic data integration , 2013, BMC Bioinformatics.

[38]  Pran K. Datta,et al.  Regulation of EMT in Colorectal Cancer: A Culprit in Metastasis , 2017, Cancers.

[39]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[40]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[41]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[42]  Florian Markowetz,et al.  Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions , 2013, Nature Medicine.

[43]  Sylvie Mader,et al.  MCM2: An alternative to Ki-67 for measuring breast cancer cell proliferation , 2017, Modern Pathology.

[44]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[45]  C. Zahnow,et al.  CCAAT/enhancer-binding protein β: its role in breast cancer and associations with receptor tyrosine kinases , 2009, Expert Reviews in Molecular Medicine.

[46]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[47]  Wan L. Lam,et al.  Nuclear Factor I/B: A Master Regulator of Cell Differentiation with Paradoxical Roles in Cancer , 2017, EBioMedicine.

[48]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[49]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[50]  Mark Stoneking,et al.  A new approach for detecting low-level mutations in next-generation sequence data , 2012, Genome Biology.