Evolutionary conservation of DNA-contact residues in DNA-binding domains

BackgroundDNA-binding proteins are of utmost importance to gene regulation. The identification of DNA-binding domains is useful for understanding the regulation mechanisms of DNA-binding proteins. In this study, we proposed a method to determine whether a domain or a protein can has DNA binding capability by considering evolutionary conservation of DNA-binding residues.ResultsOur method achieves high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. In addition, experimental results show that our method is able to identify the different DNA-binding behaviors of proteins in the same SCOP family based on the use of evolutionary conservation of DNA-contact residues.ConclusionThis study shows the conservation of DNA-contact residues in DNA-binding domains. We conclude that the members in the same subfamily bind DNA specifically and the members in different subfamilies often recognize different DNA targets. Additionally, we observe the co-evolution of DNA-contact residues and interacting DNA base-pairs.

[1]  S. McKnight,et al.  Eukaryotic transcriptional regulatory proteins. , 1989, Annual review of biochemistry.

[2]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[3]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[4]  G. Marius Clore,et al.  Molecular Basis for Synergistic Transcriptional Activation by Oct1 and Sox2 Revealed from the Solution Structure of the 42-kDa Oct1·Sox2·Hoxb1-DNA Ternary Transcription Factor Complex* , 2004, Journal of Biological Chemistry.

[5]  Cynthia Wolberger,et al.  Structure of HoxA9 and Pbx1 bound to DNA: Hox hexapeptide and DNA recognition anterior to posterior. , 2003, Genes & development.

[6]  T F Smith,et al.  The art of matchmaking: sequence alignment methods and their structural implications. , 1999, Structure.

[7]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[9]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[10]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[11]  S. McKnight,et al.  Scissors-grip model for DNA recognition by a family of leucine zipper proteins. , 1989, Science.

[12]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[15]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[16]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[18]  Samuel Selvaraj,et al.  Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. , 2004, Journal of molecular biology.

[19]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[20]  R. Sternglanz,et al.  Structure of the Histone Acetyltransferase Hat1 A Paradigm for the GCN5-Related N-acetyltransferase Superfamily , 1998, Cell.

[21]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[22]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[23]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[24]  S. Harrison,et al.  A structural taxonomy of DNA-binding domains , 1991, Nature.

[25]  Akinori Sarai,et al.  Moment-based prediction of DNA-binding proteins. , 2004, Journal of molecular biology.

[26]  Kengo Kinoshita,et al.  Structure‐based prediction of DNA‐binding sites on proteins Using the empirical preference of electrostatic potential and the shape of molecular surfaces , 2004, Proteins.

[27]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[28]  N. Bhardwaj,et al.  Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions , 2007, FEBS letters.

[29]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[30]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[31]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[32]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[33]  Aneel K. Aggarwal,et al.  Structure of a DNA-bound Ultrabithorax–Extradenticle homeodomain complex , 1999, Nature.

[34]  Harianto Tjong,et al.  DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces , 2007, Nucleic acids research.

[35]  Yixue Li,et al.  Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. , 2006, Journal of theoretical biology.

[36]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[37]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.