STRING v9.1: protein-protein interaction networks, with increased coverage and integration

Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made—particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.

[1]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[2]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[3]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[4]  Igor Jurisica,et al.  Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I2D , 2009, Bioinform..

[5]  Damian Szklarczyk,et al.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges , 2011, Nucleic Acids Res..

[6]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[7]  Limsoon Wong,et al.  How Advancement in Biological Network Analysis Methods Empowers Proteomics , 2022 .

[8]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[9]  Raquel Norel,et al.  Protein interface conservation across structure space , 2010, Proceedings of the National Academy of Sciences.

[10]  Jinfeng Zhang,et al.  IMID: integrated molecular interaction database , 2012, Bioinform..

[11]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[12]  Kenneth H. Wolfe,et al.  Turning a hobby into a job: How duplicated genes find new functions , 2008, Nature Reviews Genetics.

[13]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[14]  Bonnie Berger,et al.  RNAiCut: automated detection of significant genes from functional genomic screens , 2009, Nature Methods.

[15]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[16]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[17]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[18]  M. Vidal,et al.  Protein interaction mapping in C. elegans using proteins involved in vulval development. , 2000, Science.

[19]  Yan Wang,et al.  VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology , 2009, Nucleic Acids Res..

[20]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[21]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[22]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[23]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[24]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[25]  Peer Bork,et al.  Deciphering a global network of functionally associated post-translational modifications , 2012, Molecular systems biology.

[26]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[27]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[28]  Frances M. G. Pearl,et al.  Protein folds, functions and evolution. , 1999, Journal of molecular biology.

[29]  Peter Uetz,et al.  MPIDB: the microbial protein interaction database , 2008, Bioinform..

[30]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[32]  Casey S. Greene,et al.  IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2012, Nucleic Acids Res..

[33]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[34]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[35]  Joaquín Dopazo,et al.  SNOW, a web-based tool for the statistical analysis of protein–protein interaction networks , 2009, Nucleic Acids Res..

[36]  A. Pandey,et al.  Human Protein Reference Database and Human Proteinpedia as resources for phosphoproteome analysis. , 2012, Molecular bioSystems.

[37]  Peer Bork,et al.  Extraction of regulatory gene/protein networks from Medline , 2006, Bioinform..

[38]  C. Chothia,et al.  The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. , 2001, Journal of molecular biology.

[39]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[40]  P. Jallepalli,et al.  Combination of Chemical Genetics and Phosphoproteomics for Kinase Signaling Analysis Enables Confident Identification of Cellular Downstream Targets* , 2011, Molecular & Cellular Proteomics.

[41]  Stefan Wiemann,et al.  Genome-wide RNAi screening identifies human proteins with a regulatory function in the early secretory pathway , 2012, Nature Cell Biology.

[42]  Shin Yi Chew,et al.  Genome-wide RNAi screens identify genes required for Ricin and PE intoxications. , 2011, Developmental cell.

[43]  R. Tsien,et al.  Specificity and Stability in Topology of Protein Networks , 2022 .

[44]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[45]  Ralf Zimmer,et al.  Contextual analysis of RNAi-based functional screens using interaction networks , 2011, Bioinform..

[46]  Wenfeng Qian,et al.  Measuring the evolutionary rate of protein–protein interaction , 2011, Proceedings of the National Academy of Sciences.

[47]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[48]  Torsten Schwede,et al.  The SWISS-MODEL Repository and associated resources , 2008, Nucleic Acids Res..

[49]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[50]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[51]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[52]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[53]  Patrick Aloy,et al.  Ten thousand interactions for the molecular biologist , 2004, Nature Biotechnology.

[54]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[55]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[56]  Haruki Nakamura,et al.  HitPredict: a database of quality assessed protein–protein interactions in nine species , 2010, Nucleic Acids Res..

[57]  Michael Schroeder,et al.  Large-scale De Novo Prediction of Physical Protein-Protein Association* , 2011, Molecular & Cellular Proteomics.

[58]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[59]  Joël R. Pradines,et al.  Analyzing Protein Lists with Large Networks: Edge-Count Probabilities in Random Graphs with Given Expected Degrees , 2005, J. Comput. Biol..

[60]  P. James,et al.  Quantitative Proteomics Targeting Classes of Motif-containing Peptides Using Immunoaffinity-based Mass Spectrometry* , 2012, Molecular & Cellular Proteomics.

[61]  Christophe Dessimoz,et al.  Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs , 2012, PLoS Comput. Biol..

[62]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[63]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.