Complex+: Aided Decision-Making for the Study of Protein Complexes

Proteins are the chief effectors of cell biology and their functions are typically carried out in the context of multi-protein assemblies; large collections of such interacting protein assemblies are often referred to as interactomes. Knowing the constituents of protein complexes is therefore important for investigating their molecular biology. Many experimental methods are capable of producing data of use for detecting and inferring the existence of physiological protein complexes. Each method has associated pros and cons, affecting the potential quality and utility of the data. Numerous informatic resources exist for the curation, integration, retrieval, and processing of protein interactions data. While each resource may possess different merits, none are definitive and few are wieldy, potentially limiting their effective use by non-experts. In addition, contemporary analyses suggest that we may still be decades away from a comprehensive map of a human protein interactome. Taken together, we are currently unable to maximally impact and improve biomedicine from a protein interactome perspective – motivating the development of experimental and computational techniques that help investigators to address these limitations. Here, we present a resource intended to assist investigators in (i) navigating the cumulative knowledge concerning protein complexes and (ii) forming hypotheses concerning protein interactions that may yet lack conclusive evidence, thus (iii) directing future experiments to address knowledge gaps. To achieve this, we integrated multiple data-types/different properties of protein interactions from multiple sources and after applying various methods of regularization, compared the protein interaction networks computed to those available in the EMBL-EBI Complex Portal, a manually curated, gold-standard catalog of macromolecular complexes. As a result, our resource provides investigators with reliable curation of bona fide and candidate physical interactors of their protein or complex of interest, prompting due scrutiny and further validation when needed. We believe this information will empower a wider range of experimentalists to conduct focused protein interaction studies and to better select research strategies that explicitly target missing information.

[1]  Andreas Ruepp,et al.  CORUM: the comprehensive resource of mammalian protein complexes—2019 , 2018, Nucleic Acids Res..

[2]  Osamu Maruyama,et al.  NWE: Node-weighted expansion for protein complex prediction using random walk distances , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[4]  Lincoln Stein,et al.  Reactome pathway analysis: a high-performance in-memory approach , 2017, BMC Bioinformatics.

[5]  Andrew Keller,et al.  Chemical Crosslinking Mass Spectrometry Analysis of Protein Conformations and Supercomplexes in Heart Tissue. , 2017, Cell systems.

[6]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[7]  Damian Szklarczyk,et al.  Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell‐lines , 2015, Proteomics.

[8]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[9]  Gary D Bader,et al.  A travel guide to Cytoscape plugins , 2012, Nature Methods.

[10]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[11]  Malay Kumar Basu,et al.  Domain mobility in proteins: functional and evolutionary implications , 2008, Briefings Bioinform..

[12]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[13]  A. Barabasi,et al.  A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma. , 2015, Human molecular genetics.

[14]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[15]  Anaïs Baudot,et al.  Random Walk With Restart on Multiplex and Heterogeneous Biological Networks , 2017, bioRxiv.

[16]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[17]  astronomy Physics Albert-Laszlo Barabasi , 2010 .

[18]  Jimmy K. Eng,et al.  XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data , 2016, Bioinform..

[19]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[20]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[21]  C. Harris Protein-protein interactions for cancer therapy. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[22]  A. Lamond,et al.  Establishment of a Protein Frequency Library and Its Application in the Reliable Identification of Specific Protein Interaction Partners* , 2009, Molecular & Cellular Proteomics.

[23]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[24]  Henning Hermjakob,et al.  Complex Portal 2018: extended content and enhanced visualization tools for macromolecular complexes , 2018, Nucleic Acids Res..

[25]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[26]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[27]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[28]  Edward L. Huttlin,et al.  The BioPlex Network: A Systematic Exploration of the Human Interactome , 2015, Cell.

[29]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[30]  Vasant Honavar,et al.  PRIDB: a protein–RNA interface database , 2010, Nucleic Acids Res..

[31]  B. Chait,et al.  Rapid, Optimized Interactomic Screening , 2015, Nature Methods.

[32]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[33]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[34]  Sara Mostafavi,et al.  Computational Prediction of Gene Function From High-throughput Data Sources , 2011 .

[35]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[36]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[37]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[38]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[39]  Henning Hermjakob,et al.  The complex portal - an encyclopaedia of macromolecular complexes , 2014, Nucleic Acids Res..

[40]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[41]  Amber L. Couzens,et al.  The CRAPome: a Contaminant Repository for Affinity Purification Mass Spectrometry Data , 2013, Nature Methods.

[42]  Nick V Grishin,et al.  Phenotypic and genotypic analyses of genetic skin disease through the Online Mendelian Inheritance in Man (OMIM) database. , 2009, The Journal of investigative dermatology.

[43]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.