InterPro in 2017—beyond protein family and domain annotations

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

Silvio C. E. Tosatto | Neil D. Rawlings | Huaiyu Mi | Robert D. Finn | Marco Necci | Narmada Thanki | Damiano Piovesan | Cathy H. Wu | Terri K. Attwood | Rodrigo Lopez | Alan Bridge | Ioannis Xenarios | Shennan Lu | Aron Marchler-Bauer | Patricia C. Babbitt | Ian Sillitoe | Christine A. Orengo | Zsuzsanna Dosztányi | Gemma L. Holliday | Darren A. Natale | Hongzhan Huang | Ivica Letunic | Alex Bateman | Christian J. A. Sigrist | Alex L. Mitchell | Granger G. Sutton | Jaina Mistry | Paul D. Thomas | Xiaosong Huang | Julian Gough | Nicole Redaschi | Simon C. Potter | Hsin-Yu Chang | Sara El-Gebali | Lai-Su L. Yeh | Catherine Rivoire | Silvano Squizzato | Youngmi Park | David Haft | Peer Bork | Matthew Fraser | Gift Nuka | Sebastien Pesseat | Lorna Richardson | Amaia Sangrador-Vegas | Ben Smithers | Siew-Yit Yong | A. Marchler-Bauer | C. Orengo | P. Bork | G. Sutton | S. Potter | D. Natale | J. Gough | R. Finn | Jaina Mistry | A. Bateman | Damiano Piovesan | P. Babbitt | P. Thomas | I. Xenarios | T. Attwood | I. Letunic | R. Lopez | C. Sigrist | N. Thanki | H. Mi | I. Sillitoe | A. Bridge | N. Rawlings | Xiaosong Huang | Z. Dosztányi | Hongzhan Huang | N. Redaschi | L. Yeh | A. Mitchell | Lorna J. Richardson | A. Sangrador-Vegas | C. Rivoire | S. Tosatto | S. Squizzato | B. Smithers | Y. M. Park | M. Necci | Hsin-Yu Chang | Sara El-Gebali | Youngmi Park | M. Fraser | Gift Nuka | I. Letunić | Shennan Lu | Sebastien Pesseat | Siew-Yit Yong | David Haft | Young Mi Park | Nicole Redaschi | Ivica Letunic | Simon C. Potter | Ben Smithers | Aron Marchler-Bauer

[1]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[2]  Peer Bork,et al.  SMART: recent updates, new developments and status in 2015 , 2014, Nucleic Acids Res..

[3]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[4]  Dan M. Bolser,et al.  Ensembl Genomes 2016: more genomes, more complexity , 2015, Nucleic Acids Res..

[5]  Tatiana A. Tatusova,et al.  The National Center for Biotechnology Information's Protein Clusters Database , 2008, Nucleic Acids Res..

[6]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[7]  Elaine R. Mardis,et al.  The $ 1 , 000 genome , the $ 100 , 000 analysis ? , 2019 .

[8]  Robert D. Finn,et al.  GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations , 2016, Database J. Biol. Databases Curation.

[9]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[10]  Terri K. Attwood,et al.  The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012 , 2012, Database J. Biol. Databases Curation.

[11]  Michael Y. Galperin,et al.  From complete genome sequence to 'complete' understanding? , 2010, Trends in biotechnology.

[12]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[13]  Yukiko Kurihara,et al.  Calpain-6 Deficiency Promotes Skeletal Muscle Development and Regeneration , 2013, PLoS genetics.

[14]  Silvio C. E. Tosatto,et al.  MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins , 2014, Nucleic Acids Res..

[15]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[16]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[17]  Hai Fang,et al.  The SUPERFAMILY 1.75 database in 2014: a doubling of data , 2014, Nucleic Acids Res..

[18]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[19]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[20]  Silvio C. E. Tosatto,et al.  Large‐scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe , 2016, Protein science : a publication of the Protein Society.

[21]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[22]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[23]  Elisabeth Coudert,et al.  HAMAP in 2015: updates to the protein family classification and annotation system , 2014, Nucleic Acids Res..

[24]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[25]  Uma Maheswari,et al.  PhytoPath: an integrative resource for plant pathogen genomics , 2015, Nucleic Acids Res..

[26]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[27]  Tom Lenaerts,et al.  NAR Breakthrough Article: DIDA: A curated and annotated digenic diseases database , 2016, Nucleic Acids Res..

[28]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[29]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[30]  David A. Lee,et al.  Gene3D: expanding the utility of domain assignments , 2015, Nucleic Acids Res..

[31]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[32]  Abhik Mukhopadhyay,et al.  PDBe: improved accessibility of macromolecular structure data from PDB and EMDB , 2015, Nucleic Acids Res..

[33]  Yves Moreau,et al.  Candidate gene prioritization with Endeavour , 2016, Nucleic Acids Res..

[34]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[35]  Michael A. Hicks,et al.  The Structure–Function Linkage Database , 2013, Nucleic Acids Res..

[36]  H. Dyson,et al.  Intrinsically disordered proteins in cellular signalling and regulation , 2014, Nature Reviews Molecular Cell Biology.

[37]  Anne Morgat,et al.  UniPathway: a resource for the exploration and annotation of metabolic pathways , 2011, Nucleic Acids Res..

[38]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[39]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[40]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[41]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[42]  E. Mardis The $1,000 genome, the $100,000 analysis? , 2010, Genome Medicine.

[43]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[44]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[45]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[46]  Robert D. Finn,et al.  EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data , 2015, Nucleic Acids Res..

[47]  Daniel H. Huson,et al.  MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data , 2016, PLoS Comput. Biol..