The Pfam protein families database: towards a more sustainable future

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

[1]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[2]  L. Li,et al.  Role of gamma-carboxyglutamic acid in the calcium-induced structural transition of conantokin G, a conotoxin from the marine snail Conus geographus. , 1997, Biochemistry.

[3]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[4]  Martin Madera,et al.  The Evolution and Structure Prediction of Coiled Coils across All Genomes , 2022 .

[5]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[6]  J. Pereira-Leal,et al.  Evolutionary Patterns in Coiled-Coils , 2015, Genome biology and evolution.

[7]  Robert D. Finn,et al.  Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation , 2011, PloS one.

[8]  Marco Punta,et al.  AntiFam: a tool to help identify spurious ORFs in protein annotation , 2012, Database J. Biol. Databases Curation.

[9]  Robert D. Finn,et al.  HMMER web server: 2015 update , 2015, Nucleic Acids Res..

[10]  Derek N Woolfson,et al.  Prediction and analysis of higher-order coiled-coils: insights from proteins of the extracellular matrix, tenascins and thrombospondins. , 2013, The international journal of biochemistry & cell biology.

[11]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[12]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[13]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[14]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[15]  Liisa Holm,et al.  ADDA: a domain database with global coverage of the protein universe , 2004, Nucleic Acids Res..

[16]  Silvio C. E. Tosatto,et al.  MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins , 2014, Nucleic Acids Res..

[17]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[18]  Robert D. Finn,et al.  SCOOP: a simple method for identification of novel protein superfamily relationships , 2007, Bioinform..