Unified feature association networks through integration of transcriptomic and proteomic data

High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different–omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease.

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  Hans-Georg Kräusslich,et al.  Comparative lipidomics analysis of HIV‐1 particles and their producer cell membrane in different cell lines , 2013, Cellular microbiology.

[3]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[4]  Samuel H. Payne,et al.  Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements* , 2014, Molecular & Cellular Proteomics.

[5]  Kui Zhang,et al.  Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways , 2017, PloS one.

[6]  Shigehiko Kanaya,et al.  Dynamics of time-lagged gene-to-metabolite networks of Escherichia coli elucidated by integrative omics approach. , 2011, Omics : a journal of integrative biology.

[7]  A. Nisalak,et al.  Evidence that maternal dengue antibodies are important in the development of dengue hemorrhagic fever in infants. , 1988, The American journal of tropical medicine and hygiene.

[8]  Guoping Zhao,et al.  A Comprehensive Analysis of the Transcriptomes of Marssonina brunnea and Infected Poplar Leaves to Capture Vital Events in Host-Pathogen Interactions , 2015, PloS one.

[9]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[10]  Richard D. Smith,et al.  Network Analysis of Epidermal Growth Factor Signaling Using Integrated Genomic, Proteomic and Phosphorylation Data , 2012, PloS one.

[11]  Xi Chen,et al.  Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network , 2017, Experimental and therapeutic medicine.

[12]  Bor-Sen Chen,et al.  Interspecies protein-protein interaction network construction for characterization of host-pathogen interactions: a Candida albicans-zebrafish interaction study , 2013, BMC Systems Biology.

[13]  Julio R. Banga,et al.  Enabling network inference methods to handle missing data and outliers , 2015, BMC Bioinformatics.

[14]  V. Manivel,et al.  Quantitative Proteomics and Lipidomics Analysis of Endoplasmic Reticulum of Macrophage Infected with Mycobacterium tuberculosis , 2015, International journal of proteomics.

[15]  J. Mcdermott,et al.  Separating the Drivers from the Driven: Integrative Network and Pathway Approaches Aid Identification of Disease Biomarkers from High-Throughput Data , 2010, Disease markers.

[16]  Joshua N. Adkins,et al.  Systems analysis of multiple regulator perturbations allows discovery of virulence factors in Salmonella , 2011, BMC Systems Biology.

[17]  Qibin Zhang,et al.  Temporal Proteome and Lipidome Profiles Reveal Hepatitis C Virus-Associated Reprogramming of Hepatocellular Metabolism and Bioenergetics , 2010, PLoS pathogens.

[18]  Dan Gao,et al.  Combining affinity propagation clustering and mutual information network to investigate key genes in fibroid , 2017, Experimental and therapeutic medicine.

[19]  Joel G. Pounds,et al.  Improved quality control processing of peptide-centric LC-MS proteomics data , 2011, Bioinform..

[20]  James C. Schnable,et al.  Integration of omic networks in a developmental atlas of maize , 2016, Science.

[21]  K. Marchal,et al.  Inferring the relation between transcriptional and posttranscriptional regulation from expression compendia , 2014, BMC Microbiology.

[22]  Lei Fang,et al.  Systematic analysis reveals a lncRNA-mRNA co-expression network associated with platinum resistance in high-grade serous ovarian cancer , 2018, Investigational New Drugs.

[23]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[24]  Joel G. Pounds,et al.  Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from MS-Based Proteomics Data , 2010, Journal of proteome research.

[25]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[26]  B. Tjaden,et al.  The Gonococcal Transcriptome during Infection of the Lower Genital Tract in Women , 2015, PloS one.

[27]  Joel G Pounds,et al.  A statistical selection strategy for normalization procedures in LC‐MS proteomics experiments through dataset‐dependent ranking of normalization scaling factors , 2011, Proteomics.

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  Christopher C. Overall,et al.  Network analysis of transcriptomics expands regulatory landscapes in Synechococcus sp. PCC 7002 , 2016, Nucleic acids research.

[30]  J. Mateos,et al.  Quantitative proteomic analysis of host—pathogen interactions: a study of Acinetobacter baumannii responses to host airways , 2015, BMC Genomics.

[31]  S. Ferrari,et al.  Hepatocyte growth factor favors monocyte differentiation into regulatory interleukin (IL)-10++IL-12low/neg accessory cells with dendritic-cell features. , 2006, Blood.

[32]  Weidong Tian,et al.  Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function , 2008, Genome Biology.

[33]  Christopher J Petzold,et al.  Lipidomics reveals control of Mycobacterium tuberculosis virulence lipids via metabolic coupling , 2007, Proceedings of the National Academy of Sciences.

[34]  C. Nombela,et al.  Proteomic profiling of serologic response to Candida albicans during host-commensal and host-pathogen interactions. , 2009, Methods in molecular biology.

[35]  Xi Chen,et al.  Global quantitative proteomic analysis profiles host protein expression in response to Sendai virus infection , 2017, Proteomics.

[36]  G. Smyth,et al.  Microarray background correction: maximum likelihood estimation for the normal–exponential convolution , 2008, Biostatistics.

[37]  J. Adkins,et al.  The landscape of viral proteomics and its potential to impact human health , 2016, Expert review of proteomics.

[38]  J. Smit,et al.  Dengue virus life cycle: viral and host factors modulating infectivity , 2010, Cellular and Molecular Life Sciences.

[39]  B. Everitt,et al.  An Introduction to Applied Multivariate Analysis with R , 2011 .

[40]  Paul Pavlidis,et al.  “Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks , 2012, PLoS Comput. Biol..

[41]  Christopher C. Overall,et al.  Integrated in silico Analyses of Regulatory and Metabolic Networks of Synechococcus sp. PCC 7002 Reveal Relationships between Gene Centrality and Essentiality , 2015, Life.

[42]  Tom Ross,et al.  Integrated Transcriptomic and Proteomic Analysis of the Physiological Response of Escherichia coli O157:H7 Sakai to Steady-state Conditions of Cold and Water Activity Stress* , 2011, Molecular & Cellular Proteomics.

[43]  Qihan Li,et al.  Antibody-dependent enhancement of dengue virus infection inhibits RLR-mediated Type-I IFN-independent signalling through upregulation of cellular autophagy , 2016, Scientific Reports.

[44]  Sophia Tsoka,et al.  Gene Network and Proteomic Analyses of Cardiac Responses to Pathological and Physiological Stress , 2013, Circulation. Cardiovascular genetics.

[45]  A. Maresso,et al.  Global Metabolomic Analysis of a Mammalian Host Infected with Bacillus anthracis , 2015, Infection and Immunity.

[46]  B. Finlay,et al.  Impact of Salmonella Infection on Host Hormone Metabolism Revealed by Metabolomics , 2011, Infection and Immunity.

[47]  Joachim Selbig,et al.  Biological Cluster Evaluation for Gene Function Prediction , 2014, J. Comput. Biol..

[48]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[49]  Chengjun Li,et al.  The effect of inhibition of PP1 and TNFα signaling on pathogenesis of SARS coronavirus , 2016, BMC Systems Biology.

[50]  C. Otth,et al.  Transcriptomic analysis of responses to cytopathic bovine viral diarrhea virus-1 (BVDV-1) infection in MDBK cells. , 2016, Molecular immunology.

[51]  Courtney Corley,et al.  Topological analysis of protein co-abundance networks identifies novel host targets important for HCV infection and pathogenesis , 2012, BMC Systems Biology.

[52]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[53]  C. Rogel-Gaillard,et al.  Transcriptomic analysis of the dialogue between Pseudorabies virus and porcine epithelial cells during infection , 2008, BMC Genomics.

[54]  Samuel H. Payne,et al.  The utility of protein and mRNA correlation. , 2015, Trends in biochemical sciences.

[55]  T. Salthouse Do cognitive interventions alter the rate of age-related cognitive change? , 2015, Intelligence.

[56]  H. Oshitani,et al.  Novel insights into human respiratory syncytial virus-host factor interactions through integrated proteomics and transcriptomics analysis , 2016, Expert review of anti-infective therapy.

[57]  S. Kalayanarooj,et al.  Dengue virus (DENV) antibody-dependent enhancement of infection upregulates the production of anti-inflammatory cytokines, but suppresses anti-DENV free radical and pro-inflammatory cytokine production, in THP-1 cells. , 2007, The Journal of general virology.

[58]  Mariano J. Alvarez,et al.  An Integrated Systems Biology Approach Identifies TRIM25 as a Key Determinant of Breast Cancer Metastasis. , 2017, Cell reports.

[59]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[60]  M. Fu,et al.  A transcriptional miRNA-gene network associated with lung adenocarcinoma metastasis based on the TCGA database. , 2016, Oncology reports.

[61]  Kieran J. Sharkey,et al.  A novel untargeted metabolomics correlation-based network analysis incorporating human metabolic reconstructions , 2013, BMC Systems Biology.

[62]  S. Brunke,et al.  Dual-species transcriptional profiling during systemic candidiasis reveals organ-specific host-pathogen interactions , 2016, Scientific Reports.

[63]  Matthew D. Dyer,et al.  The Landscape of Human Proteins Interacting with Viruses and Other Pathogens , 2008, PLoS pathogens.

[64]  Zhi-Liang Zheng,et al.  Transcriptome comparison and gene coexpression network analysis provide a systems view of citrus response to ‘Candidatus Liberibacter asiaticus’ infection , 2013, BMC Genomics.

[65]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[66]  Young-Mo Kim,et al.  A multi-omic systems approach to elucidating Yersinia virulence mechanisms. , 2013, Molecular bioSystems.

[67]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[68]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[69]  J. F. Pagotto,et al.  Does the administration of pilocarpine prior to venom milking influence the composition of Micrurus corallinus venom? , 2018, Journal of proteomics.

[70]  Hyunjin Yoon,et al.  Bottlenecks and Hubs in Inferred Networks Are Important for Virulence in Salmonella typhimurium , 2009, J. Comput. Biol..

[71]  Qibin Zhang,et al.  A comprehensive collection of systems biology data characterizing the host response to viral infection , 2014, Scientific Data.