The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

Abstract Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

[1]  John G. Doench,et al.  Genome-wide CRISPR screen reveals host genes that regulate SARS-CoV-2 infection , 2020, bioRxiv.

[2]  N Del-Toro,et al.  The IMEx Coronavirus interactome: an evolving map of Coronaviridae-Host molecular interactions , 2020, bioRxiv.

[3]  Maria Serban,et al.  Exploring modularity in biological networks , 2020, Philosophical Transactions of the Royal Society B.

[4]  Peter D. Karp,et al.  The MetaCyc database of metabolic pathways and enzymes - a 2019 update , 2019, Nucleic Acids Res..

[5]  Dimitri Guala,et al.  Genome-wide functional association networks: background, data & state-of-the-art resources , 2019, Briefings Bioinform..

[6]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[7]  Lorenzo Farina,et al.  A paradigm shift in medicine: A comprehensive review of network-based approaches. , 2020, Biochimica et biophysica acta. Gene regulatory mechanisms.

[8]  M. Schrader,et al.  Co-regulation map of the human proteome enables identification of protein functions , 2019, Nature Biotechnology.

[9]  Donna K. Slonim,et al.  Assessment of network module identification across complex diseases , 2019, Nature Methods.

[10]  Robert Petryszak,et al.  The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences , 2019, bioRxiv.

[11]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[12]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[13]  Jan Gorodkin,et al.  Cytoscape stringApp: Network analysis and visualization of proteomics data , 2018, bioRxiv.

[14]  Sunmo Yang,et al.  HumanNet v2: human gene networks for disease research , 2018, Nucleic Acids Res..

[15]  Igor Jurisica,et al.  IID 2018 update: context-specific physical protein–protein interactions in human, model organisms and domesticated species , 2018, Nucleic Acids Res..

[16]  Alan J. Robinson,et al.  MitoMiner v4.0: an updated database of mitochondrial localization evidence, phenotypes and diseases , 2018, Nucleic Acids Res..

[17]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[18]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[19]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[20]  Henning Hermjakob,et al.  Complex Portal 2018: extended content and enhanced visualization tools for macromolecular complexes , 2018, Nucleic Acids Res..

[21]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[22]  P. Sanseau,et al.  Drug repurposing: progress, challenges and recommendations , 2018, Nature Reviews Drug Discovery.

[23]  Nasser Ghadiri,et al.  A review of network‐based approaches to drug repositioning , 2018, Briefings Bioinform..

[24]  Hairong Lv,et al.  Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning. , 2018, Methods.

[25]  Diogo M. Camacho,et al.  Next-Generation Machine Learning for Biological Networks , 2018, Cell.

[26]  Wei Zhang,et al.  Systematic Evaluation of Molecular Networks for Discovery of Disease Genes. , 2018, Cell systems.

[27]  Mateusz Kaduk,et al.  FunCoup 4: new species, data, and visualization , 2017, Nucleic Acids Res..

[28]  Peer Bork,et al.  20 years of the SMART protein domain annotation resource , 2017, Nucleic Acids Res..

[29]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[30]  Richard Bonneau,et al.  deepNF: deep network fusion for protein function prediction , 2017, bioRxiv.

[31]  J. Reifman,et al.  A strategy for evaluating pathway analysis methods , 2017, BMC Bioinformatics.

[32]  Benjamin J. Raphael,et al.  Network propagation: a universal amplifier of genetic associations , 2017, Nature Reviews Genetics.

[33]  S. Brunak,et al.  Network biology concepts in complex disease comorbidities , 2016, Nature Reviews Genetics.

[34]  Lars Juhl Jensen,et al.  One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition , 2016, bioRxiv.

[35]  Christian von Mering,et al.  SVD-phy: improved prediction of protein functional associations through singular value decomposition of phylogenetic profiles , 2015, Bioinform..

[36]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[37]  Krzysztof J. Szkop,et al.  Multiple sources of bias confound functional enrichment analysis of global -omics data , 2015, Genome Biology.

[38]  Olga G. Troyanskaya,et al.  IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2015, Nucleic Acids Res..

[39]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[40]  Damian Szklarczyk,et al.  Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell‐lines , 2015, Proteomics.

[41]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[42]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[43]  Jari Björne,et al.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization , 2013, PloS one.

[44]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[45]  Ralf Herwig,et al.  The ConsensusPathDB interaction database: 2013 update , 2012, Nucleic Acids Res..

[46]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[47]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[48]  David B. Dunson,et al.  Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions , 2011, PLoS Comput. Biol..

[49]  Fengzhu Sun,et al.  Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach , 2011, BMC Bioinformatics.

[50]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[51]  Weidong Tian,et al.  Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function , 2008, Genome Biology.

[52]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[53]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[54]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[55]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[56]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[57]  Anton J. Enright,et al.  Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions , 2001, Genome Biology.

[58]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[59]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[60]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[61]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .