TIGRFAMs and Genome Properties in 2013

TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.

[1]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[2]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Daniel H. Haft,et al.  Archaeosortases and Exosortases Are Widely Distributed Systems Linking Membrane Transit with Posttranslational Modification , 2011, Journal of bacteriology.

[4]  Malay Kumar Basu,et al.  ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process , 2011, BMC Bioinformatics.

[5]  Daniel H. Haft,et al.  Cell Contact–Dependent Outer Membrane Exchange in Myxobacteria: Genetic Determinants and Mechanism , 2012, PLoS genetics.

[6]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[7]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[8]  Erin Beck,et al.  The comprehensive microbial resource , 2000, Nucleic Acids Res..

[9]  D. Haft,et al.  Biological Systems Discovery In Silico: Radical S-Adenosylmethionine Protein Families and Their Target Peptides for Posttranslational Modification , 2011, Journal of bacteriology.

[10]  Stan J. J. Brouns,et al.  Evolution and classification of the CRISPR–Cas systems , 2011, Nature Reviews Microbiology.

[11]  Jorge F. Reyes-Spindola,et al.  Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: functional characterization using new analysis and information visualization methods. , 2001, Nucleic acids research.

[12]  D. Haft,et al.  Orphan SelD proteins and selenium-dependent molybdenum hydroxylases , 2008, Biology Direct.

[13]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[14]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[15]  Amy C Rosenzweig,et al.  Chemistry and biology of the copper chelator methanobactin. , 2012, ACS chemical biology.

[16]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[17]  Daniel H. Haft,et al.  A Guild of 45 CRISPR-Associated (Cas) Protein Families and Multiple CRISPR/Cas Subtypes Exist in Prokaryotic Genomes , 2005, PLoS Comput. Biol..

[18]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[19]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[20]  Y. Lemoine,et al.  Cloning of the biotin synthetase gene from Bacillus sphaericus and expression in Escherichia coli and Bacilli. , 1989, Gene.

[21]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[22]  Ramana Madupu,et al.  CharProtDB: a database of experimentally characterized protein annotations , 2011, Nucleic Acids Res..

[23]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[24]  Daniel H. Haft,et al.  GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease , 2011, PloS one.

[25]  Owen White,et al.  Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics , 2005, Bioinform..

[26]  Stéphane Vuilleumier,et al.  A comparison of methanobactins from Methylosinus trichosporium OB3b and Methylocystis strain Sb2 predicts methanobactins are synthesized from diverse peptide precursors modified to create a common core for binding and reducing copper ions. , 2010, Biochemistry.