A Web-based classification system of DNA-binding protein families.

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The family of DNA-binding proteins is one of the most populated and studied amongst the various genomes of bacteria, archaea and eukaryotes and the Web-based system presented here is an approach to their classification. The DnaProt resource is an annotated and searchable collection of protein sequences for the families of DNA-binding proteins. The database contains 3238 full-length sequences (retrieved from the SWISS-PROT database, release 38) that include, at least, a DNA-binding domain. Sequence entries are organized into families defined by PROSITE patterns, PRINTS motifs and de novo excised signatures. Combining global similarities and functional motifs into a single classification scheme, DNA-binding proteins are classified into 33 unique classes, which helps to reveal comprehensive family relationships. To maximize family information retrieval, DnaProt contains a collection of multiple alignments for each DNA-binding family while the recognized motifs can be used as diagnostically functional fingerprints. All available structural class representatives have been referenced. The resource was developed as a Web-based management system for online free access of customized data sets. Entries are fully hyperlinked to facilitate easy retrieval of the original records from the source databases while functional and phylogenetic annotation will be applied to newly sequenced genomes. The database is freely available for online search of a library containing specific patterns of the identified DNA-binding protein classes and retrieval of individual entries from our WWW server (http://kronos.biol.uoa.gr/~mariak/dbDNA.html).

[1]  A Klug,et al.  Physical basis of a protein-DNA recognition code. , 1997, Current opinion in structural biology.

[2]  M. Czisch,et al.  Structure in solution of the major cold-shock protein from Bacillus subtilis , 1993, Nature.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  L. Silver,et al.  The T‐box gene family , 1998, BioEssays : news and reviews in molecular, cellular and developmental biology.

[5]  A. Wolffe,et al.  Xenopus Y-box transcription factors: molecular cloning, functional analysis and developmental regulation. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Adobe Press,et al.  PostScript Language Reference Manual , 1985 .

[7]  M. Evans,et al.  A combined analysis of genomic and primary protein structure defines the phylogenetic relationship of new members if the T-box family. , 1998, Genomics.

[8]  G T Montelione,et al.  Solution NMR structure and backbone dynamics of the major cold-shock protein (CspA) from Escherichia coli: evidence for conformational dynamics in the single-stranded RNA-binding site. , 1998, Biochemistry.

[9]  Cathy H. Wu,et al.  A Protein Class Database Organized with ProSite Protein Groups and PIR Superfamilies , 1996, J. Comput. Biol..

[10]  W. A. Johnson,et al.  Binding of a Drosophila POU-domain protein to a sequence element regulating gene expression in specific dopaminergic neurons , 1990, Nature.

[11]  D. Landsman RNP-1, an RNA-binding motif is conserved in the DNA-binding cold shock domain. , 1992, Nucleic acids research.

[12]  S. Harrison,et al.  A structural taxonomy of DNA-binding domains , 1991, Nature.

[13]  Juli D. Klemm,et al.  Crystal structure of the Oct-1 POU domain bound to an octamer site: DNA recognition with tethered DNA-binding modules , 1994, Cell.

[14]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[15]  F. Neidhart Escherichia coli and Salmonella. , 1996 .

[16]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[17]  Terri K. Attwood,et al.  The PRINTS protein fingerprint database in its fifth year , 1998, Nucleic Acids Res..

[18]  G. Wistow Cold shock and DNA binding , 1990, Nature.

[19]  Chris Sander,et al.  CAST: an iterative algorithm for the complexity analysis of sequence tracts , 2000, Bioinform..

[20]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[21]  P. Wright,et al.  The solution structure of the Oct-1 POU-specific domain reveals a striking similarity to the bacteriophage λ repressor DNA-binding domain , 1993, Cell.

[22]  G. Ruvkun,et al.  The POU domain: a large conserved region in the mammalian pit-1, oct-1, oct-2, and Caenorhabditis elegans unc-86 gene products. , 1988, Genes & development.

[23]  Alex Bateman,et al.  InterPro : An integrated documentation resource for protein families , domains and functional sites The InterPro Consortium : , 2005 .

[24]  Jan Reichert,et al.  The IMB Jena Image Library of Biological Macromolecules - New Features , 2001, German Conference on Bioinformatics.

[25]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[26]  E. Koonin Genome sequences: Genome sequence of a model prokaryote , 1997, Current Biology.

[27]  C. Müller,et al.  Crystallographic structure of the T domain–DNA complex of the Brachyury transcription factor , 1997, Nature.

[28]  C. Sander,et al.  From genome sequences to protein function , 1994 .

[29]  R. Quatrano Genomics , 1998, Plant Cell.

[30]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[31]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .