SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.

[1]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[2]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[3]  Lars Backman,et al.  A Comparative and Phylogenetic Analysis of the α-Actinin Rod Domain , 2007 .

[4]  Peer Bork,et al.  SMART 5: domains in the context of genomes and networks , 2005, Nucleic Acids Res..

[5]  J. Pereira-Leal,et al.  Multiple domain insertions and losses in the evolution of the Rab prenylation complex , 2007, BMC Evolutionary Biology.

[6]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[7]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[8]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[9]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[10]  J. Kalinowski,et al.  The LacI/GalR family transcriptional regulator UriR negatively controls uridine utilization of Corynebacterium glutamicum by binding to catabolite-responsive element (cre)-like sequences. , 2008, Microbiology.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[13]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[14]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[15]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[16]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[17]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[18]  S. Teichmann,et al.  The relationship between domain duplication and recombination. , 2005, Journal of molecular biology.

[19]  Cyrus Chothia,et al.  Protein Family Expansions and Biological Complexity , 2006, PLoS Comput. Biol..

[20]  Christine A. Orengo,et al.  Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes , 2007, PLoS Comput. Biol..

[21]  Sarah A. Teichmann,et al.  DBD––taxonomically broad transcription factor predictions: new content and functionality , 2007, Nucleic Acids Res..

[22]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[23]  J. Pereira-Leal,et al.  The Ypt/Rab Family and the Evolution of Trafficking in Fungi , 2008, Traffic.

[24]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[25]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[26]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[27]  David L. Steffen,et al.  The genome of the social amoeba Dictyostelium discoideum , 2005, Nature.

[28]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[29]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[30]  Martin Madera,et al.  Profile Comparer: a program for scoring and aligning profile hidden Markov models , 2008, Bioinform..

[31]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[32]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[33]  Hiroyuki Ogata,et al.  Metagrowth: a new resource for the building of metabolic hypotheses in microbiology , 2004, Nucleic Acids Res..

[34]  V. Ingram The evolution of a protein. , 1962, Federation proceedings.

[35]  David A. Lee,et al.  Gene3D: modelling protein structure, function and evolution , 2005, Nucleic Acids Res..

[36]  Lars Backman,et al.  A comparative and phylogenetic analysis of the alpha-actinin rod domain. , 2007, Molecular biology and evolution.

[37]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[38]  R. Doolittle,et al.  Phylogeny determined by protein domain content. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[40]  Gustavo Caetano-Anollés,et al.  Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. , 2007, Genome research.

[41]  Michal Linial,et al.  Connect the dots: exposing hidden protein family connections from the entire sequence tree , 2008, ECCB.

[42]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.