iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery

Computational tools for multiomics data integration have usually been designed for unsupervised detection of multiomics features explaining large phenotypic variations. To achieve this, some approaches extract latent signals in heterogeneous data sets from a joint statistical error model, while others use biological networks to propagate differential expression signals and find consensus signatures. However, few approaches directly consider molecular interaction as a data feature, the essential linker between different omics data sets. The increasing availability of genome-scale interactome data connecting different molecular levels motivates a new class of methods to extract interactive signals from multiomics data. Here we developed iOmicsPASS, a tool to search for predictive subnetworks consisting of molecular interactions within and between related omics data types in a supervised analysis setting. Based on user-provided network data and relevant omics data sets, iOmicsPASS computes a score for each molecular interaction, and applies a modified nearest shrunken centroid algorithm to the scores to select densely connected subnetworks that can accurately predict each phenotypic group. iOmicsPASS detects a sparse set of predictive molecular interactions without loss of prediction accuracy compared to alternative methods, and the selected network signature immediately provides mechanistic interpretation of the multiomics profile representing each sample group. Extensive simulation studies demonstrate clear benefit of interaction-level modeling. iOmicsPASS analysis of TCGA/CPTAC breast cancer data also highlights new transcriptional regulatory network underlying the basal-like subtype as positive protein markers, a result not seen through analysis of individual omics data.

[1]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[2]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[3]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[5]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.

[6]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[7]  Luis Serrano,et al.  Correlation of mRNA and protein in complex biological samples , 2009, FEBS letters.

[8]  Lihua Liu,et al.  TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies , 2004, Nucleic Acids Res..

[9]  Subha Madhavan,et al.  The CPTAC Data Portal: A Resource for Cancer Proteomics Research. , 2015, Journal of proteome research.

[10]  Qing Yang,et al.  ITFP: an integrated platform of mammalian transcription factors , 2008, Bioinform..

[11]  Tom Michoel,et al.  Integrative Multi-omics Module Network Inference with Lemon-Tree , 2014, PLoS Comput. Biol..

[12]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[13]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[14]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[15]  Roded Sharan,et al.  Network-Based Integration of Disparate Omic Data To Identify "Silent Players" in Cancer , 2015, PLoS Comput. Biol..

[16]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[17]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[18]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[19]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[20]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[21]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[22]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[24]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[25]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[26]  E. Marcotte,et al.  Insights into the regulation of protein abundance from proteomic and transcriptomic analyses , 2012, Nature Reviews Genetics.

[27]  Rok Blagus,et al.  Class prediction for high-dimensional class-imbalanced data , 2010, BMC Bioinformatics.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.