STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Abstract Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein–protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein–protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

[1]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[2]  Claire D. McWhite,et al.  Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes , 2017, Molecular systems biology.

[3]  Noah M. Daniels,et al.  Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks , 2013, PloS one.

[4]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[5]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[6]  Samuel E. Buttrey,et al.  treeClust: An R Package for Tree-Based Clustering Dissimilarities , 2015, R J..

[7]  Casey S. Greene,et al.  IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2012, Nucleic Acids Res..

[8]  Tatsuya Akutsu,et al.  Complex network-based approaches to biomarker discovery. , 2016, Biomarkers in medicine.

[9]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[10]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[11]  L. Jensen,et al.  Viruses.STRING: A Virus-Host Protein-Protein Interaction Database , 2018, Viruses.

[12]  Kara Dolinski,et al.  The BioGRID interaction database: 2017 update , 2016, Nucleic Acids Res..

[13]  Geoffrey J. Barton,et al.  PIPs: human protein–protein interaction prediction database , 2008, Nucleic Acids Res..

[14]  Bindu Nanduri,et al.  HPIDB 2.0: a curated database for host–pathogen interactions , 2016, Database J. Biol. Databases Curation.

[15]  Lei Deng,et al.  PrePPI: a structure-informed database of protein–protein interactions , 2012, Nucleic Acids Res..

[16]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[17]  Shoba Ranganathan,et al.  Protein-protein interactions and prediction: a comprehensive overview. , 2013, Protein and peptide letters.

[18]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[19]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[20]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[21]  S. Eschrich,et al.  The gene expression profiles of primary and metastatic melanoma yields a transition point of tumor progression and metastasis , 2008, BMC Medical Genomics.

[22]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[23]  Jan Gorodkin,et al.  TISSUES 2.0: an integrative web resource on mammalian tissue expression , 2018, Database J. Biol. Databases Curation.

[24]  Igor Jurisica,et al.  Integrated interactions database: tissue-specific view of the human and model organism interactomes , 2015, Nucleic Acids Res..

[25]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  Lisa C. Crossman,et al.  Genome-Scale Metabolic Model Driven Design of a Defined Medium for Campylobacter jejuni M1cam , 2020, Frontiers in Microbiology.

[28]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[29]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[30]  Gary D. Bader,et al.  GeneMANIA update 2018 , 2018, Nucleic Acids Res..

[31]  Erik L. L. Sonnhammer,et al.  Functional association networks as priors for gene regulatory network inference , 2014, Bioinform..

[32]  Sebastian Falk,et al.  Structure of the nuclear exosome captured on a maturing preribosome , 2018, Science.

[33]  Damian Szklarczyk,et al.  STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data , 2015, Nucleic Acids Res..

[34]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[35]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[36]  Olga G. Troyanskaya,et al.  IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2015, Nucleic Acids Res..

[37]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[38]  Nevena Veljkovic,et al.  Mapping of Protein-Protein Interactions: Web-Based Resources for Revealing Interactomes. , 2018, Current medicinal chemistry.

[39]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[40]  Anton J. Enright,et al.  Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions , 2001, Genome Biology.

[41]  Lenore Cowen,et al.  New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence , 2014, Bioinform..

[42]  S. Teichmann,et al.  Structure, dynamics, assembly, and evolution of protein complexes. , 2015, Annual review of biochemistry.

[43]  Wei Zhang,et al.  Systematic Evaluation of Molecular Networks for Discovery of Disease Genes. , 2018, Cell systems.

[44]  Philip E. Bourne,et al.  Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models , 2005, PLoS Comput. Biol..

[45]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[46]  Gang Bai,et al.  20(S)-Protopanaxatriol promotes the binding of P53 and DNA to regulate the antitumor network via multiomic analysis , 2020, Acta pharmaceutica Sinica. B.

[47]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[48]  Jesús Espinal-Enríquez,et al.  Pathway Analysis: State of the Art , 2015, Front. Physiol..

[49]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[50]  Damian Szklarczyk,et al.  Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell‐lines , 2015, Proteomics.

[51]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[52]  Alex W. Wilkinson,et al.  Computational prediction of protein-protein interactions , 2012 .

[53]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[54]  Jaques Reifman,et al.  A strategy for evaluating pathway analysis methods , 2017, BMC Bioinformatics.

[55]  T. Steitz,et al.  The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. , 2000, Science.

[56]  Babak Sokouti,et al.  Systems biology comprehensive analysis on breast cancer for identification of key gene modules and genes associated with TNM-based clinical stages , 2020, Scientific Reports.

[57]  M. Vidal,et al.  Protein interaction mapping in C. elegans using proteins involved in vulval development. , 2000, Science.

[58]  Charles E. Cook,et al.  Identifying ELIXIR Core Data Resources , 2016, F1000Research.

[59]  Phillip A. Richmond,et al.  metPropagate: network-guided propagation of metabolomic information for prioritization of metabolic disease genes , 2020, npj Genomic Medicine.

[60]  Mateusz Kaduk,et al.  FunCoup 4: new species, data, and visualization , 2017, Nucleic Acids Res..

[61]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[62]  B. Snel,et al.  The identification of functional modules from the genomic association of genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Kazuyuki Aihara,et al.  Quantifying critical states of complex diseases using single-sample dynamic network biomarkers , 2017, PLoS Comput. Biol..

[64]  Christian von Mering,et al.  HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences , 2013, Bioinform..

[65]  Ralf Herwig,et al.  Analyzing and interpreting genome data at the network level with ConsensusPathDB , 2016, Nature Protocols.

[66]  Bonnie Berger,et al.  Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways. , 2017, Cell systems.

[67]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[68]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[69]  Matthias Heinig,et al.  Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation , 2020, Computational and structural biotechnology journal.