DBD––taxonomically broad transcription factor predictions: new content and functionality

DNA-binding domain (DBD) is a database of predicted sequence-specific DNA-binding transcription factors (TFs) for all publicly available proteomes. The proteomes have increased from 150 in the initial version of DBD to over 700 in the current version. All predicted TFs must contain a significant match to a hidden Markov model representing a sequence-specific DNA-binding domain family. Access to TF predictions is provided through http://transcriptionfactor.org, where new search options are now provided such as searching by gene names in model organisms, searching for all proteins in a particular DBD family and specific organism. We illustrate the application of this type of search facility by contrasting trends of DBD family occurrence throughout the tree of life, highlighting the clear partition between eukaryotic and prokaryotic DBD expansions. The website content has been expanded to include dedicated pages for each TF containing domain assignment details, gene names, links to external databases and links to TFs with similar domain arrangements. We compare the increase in number of predicted TFs with proteome size in eukaryotes and prokaryotes. Eukaryotes follow a slower rate of increase in TFs than prokaryotes, which could be due to the presence of splice variants or an increase in combinatorial control.

[1]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[2]  M. Bulyk Computational prediction of transcription-factor binding site locations , 2003, Genome Biology.

[3]  E. Nimwegen Scaling Laws in the Functional Content of Genomes , 2003, physics/0307001.

[4]  Daniel W. A. Buchan,et al.  Evolution of protein superfamilies and bacterial genome size. , 2004, Journal of molecular biology.

[5]  M. Ohme-Takagi,et al.  Ethylene-inducible DNA binding proteins that interact with an ethylene-responsive element. , 1995, The Plant cell.

[6]  J. Ecker,et al.  Activation of the Ethylene Gas Response Pathway in Arabidopsis by the Nuclear Protein ETHYLENE-INSENSITIVE3 and Related Proteins , 1997, Cell.

[7]  M. Cohn,et al.  Hox9 genes and vertebrate limb specification , 1997, nature.

[8]  Julio Collado-Vides,et al.  Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea , 2004, Comput. Biol. Chem..

[9]  Sarah A. Teichmann,et al.  DBD: a transcription factor prediction database , 2005, Nucleic Acids Res..

[10]  M. Madan Babu,et al.  Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains , 2005, Nucleic acids research.

[11]  Obi L. Griffith,et al.  cisRED: a database system for genome-scale computational discovery of regulatory elements , 2005, Nucleic Acids Res..

[12]  R. Durbin,et al.  Enhanced protein domain discovery by using language modeling techniques from speech recognition , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[14]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[15]  R. Doolittle,et al.  Phylogeny determined by protein domain content. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[17]  David N. Messina,et al.  An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. , 2004, Genome research.

[18]  Sarah A. Teichmann,et al.  FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster , 2006, Bioinform..

[19]  W. Doolittle,et al.  A kingdom-level phylogeny of eukaryotes based on combined protein data. , 2000, Science.

[20]  Francesc X. Avilés,et al.  TrSDB: a proteome database of transcription factors , 2004, Nucleic Acids Res..

[21]  Erich Bornberg-Bauer,et al.  Convergent evolution of gene networks by single‐gene duplications in higher eukaryotes , 2004, EMBO reports.

[22]  Peer Bork,et al.  Predicting protein cellular localization using a domain projection method. , 2002, Genome research.