FIGfams: yet another set of protein families

We present FIGfams, a new collection of over 100 000 protein families that are the product of manual curation and close strain comparison. Using the Subsystem approach the manual curation is carried out, ensuring a previously unattained degree of throughput and consistency. FIGfams are based on over 950 000 manually annotated proteins and across many hundred Bacteria and Archaea. Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins. FIGfams are freely available under an open source license. These can be downloaded at ftp://ftp.theseed.org/FIGfams/. The web site for FIGfams is http://www.theseed.org/wiki/FIGfams/

[1]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2]  P. Renault,et al.  An aminoacyl-tRNA synthetase paralog with a catalytic role in histidine biosynthesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Jensen Orthologs and paralogs - we need to get it right , 2001, Genome Biology.

[4]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[5]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[6]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[7]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[8]  陈奕欣 PIRSF: family classification system at the Protein Information Resource , 2004 .

[9]  A. Bairoch,et al.  The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. , 2004, Plant physiology and biochemistry : PPB.

[10]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[11]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[12]  G. Murphy,et al.  Regulation of the hetero‐octameric ATP phosphoribosyl transferase complex from Thermotoga maritima by a tRNA synthetase‐like subunit , 2004, Molecular microbiology.

[13]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[14]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[15]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[16]  S. Chin,et al.  Human and mouse oligonucleotide-based array CGH , 2005, Nucleic acids research.

[17]  John B. Anderson,et al.  CDD: a Conserved Domain Database for protein classification , 2004, Nucleic Acids Res..

[18]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[19]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[20]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[21]  Fangfang Xia,et al.  The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation , 2006, Nucleic Acids Res..

[22]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[23]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[24]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[25]  Daniela Bartels,et al.  Annotation of Bacterial and Archaeal Genomes: Improving Accuracy and Consistency , 2007 .

[26]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[27]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[28]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[29]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .