The InterPro protein families and domains database: 20 years on

Abstract The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

[1]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[2]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[3]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[4]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[5]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[6]  Burkhard Rost,et al.  MSAViewer: interactive JavaScript visualization of multiple sequence alignments , 2016, Bioinform..

[7]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[8]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[9]  Wen J. Li,et al.  RefSeq: an update on prokaryotic genome annotation and curation , 2017, Nucleic Acids Res..

[10]  Elisabeth Coudert,et al.  HAMAP in 2015: updates to the protein family classification and annotation system , 2014, Nucleic Acids Res..

[11]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[12]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[13]  Cathy H. Wu,et al.  PIRSF Family Classification System for Protein Functional and Evolutionary Analysis , 2006, Evolutionary bioinformatics online.

[14]  Peer Bork,et al.  20 years of the SMART protein domain annotation resource , 2017, Nucleic Acids Res..

[15]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[16]  Michael A. Hicks,et al.  The Structure–Function Linkage Database , 2013, Nucleic Acids Res..

[17]  Henrik Nielsen,et al.  Predicting Secretory Proteins with SignalP. , 2017, Methods in molecular biology.

[18]  Ian Sillitoe,et al.  CATH: expanding the horizons of structure-based functional annotations for genome sequences , 2018, Nucleic Acids Res..

[19]  Prudence Mutowo-Meullenet,et al.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation , 2012, Database J. Biol. Databases Curation.

[20]  Terri K. Attwood,et al.  The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012 , 2012, Database J. Biol. Databases Curation.

[21]  Narmada Thanki,et al.  CDD/SPARCLE: the conserved domain database in 2020 , 2019, Nucleic Acids Res..

[22]  Maria Jesus Martin,et al.  ProtVista: visualization of protein sequence annotations , 2017, Bioinform..

[23]  Julian Gough,et al.  The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver , 2018, Nucleic Acids Res..

[24]  Anushya Muruganujan,et al.  PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools , 2018, Nucleic Acids Res..

[25]  Marcin J. Skwark,et al.  Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation , 2019, Nucleic Acids Res..

[26]  Silvio C. E. Tosatto,et al.  MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins , 2017, Nucleic Acids Res..

[27]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[28]  Ludo Waltman,et al.  Software survey: VOSviewer, a computer program for bibliometric mapping , 2009, Scientometrics.