Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity.

We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.

[1]  Y. Nakabeppu,et al.  The basic region of Fos mediates specific DNA binding. , 1989, The EMBO journal.

[2]  L. Mirny,et al.  Structural analysis of conserved base pairs in protein-DNA complexes. , 2002, Nucleic acids research.

[3]  P. Rice,et al.  Making DNA do a U-turn: IHF and related proteins. , 1997, Current opinion in structural biology.

[4]  M. Karin,et al.  Phorbol ester-inducible genes contain a common cis element recognized by a TPA-modulated trans-acting factor , 1987, Cell.

[5]  M. Tassabehji,et al.  Mutations in the PAX3 gene causing Waardenburg syndrome type 1 and type 2 , 1993, Nature Genetics.

[6]  Gary Parkinson,et al.  Aromatic hydrogen bond in sequence-specific protein DNA recognition , 1996, Nature Structural Biology.

[7]  A Klug,et al.  Physical basis of a protein-DNA recognition code. , 1997, Current opinion in structural biology.

[8]  R. Dildrop,et al.  Single exchanges of amino acids in the basic region change the specificity of N-Myc. , 1993, Nucleic acids research.

[9]  P. Sigler,et al.  Structure of NF-κB p50 homodimer bound to a κB site , 1998, Nature.

[10]  K Nadassy,et al.  Structural features of protein-nucleic acid recognition sites. , 1999, Biochemistry.

[11]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[12]  X. Zhang,et al.  Substitution of 2 base pairs (1 base pair per DNA half-site) within the Escherichia coli lac promoter DNA site for catabolite gene activator protein places the lac promoter in the FNR regulon. , 1990, The Journal of biological chemistry.

[13]  L. Guarente,et al.  The yeast activator HAP1--a GAL4 family member--binds DNA in a directly repeated orientation. , 1994, Genes & development.

[14]  H M Berman,et al.  Protein-DNA interactions: A structural analysis. , 1999, Journal of molecular biology.

[15]  B. Trask,et al.  Complete physical map of the common deletion region in Williams syndrome and identification and characterization of three novel genes , 1998, Human Genetics.

[16]  S. Harrison,et al.  A structural taxonomy of DNA-binding domains , 1991, Nature.

[17]  Michael Carey,et al.  DNA recognition by GAL4: structure of a protein-DNA complex , 1992, Nature.

[18]  M. Schumacher,et al.  Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by alpha helices. , 1994, Science.

[19]  Peter König,et al.  The Crystal Structure of the DNA-Binding Domain of Yeast RAP1 in Complex with Telomeric DNA , 1996, Cell.

[20]  F. Cohen,et al.  Evolutionarily conserved Gabg binding surfaces support a model of the G protein-receptor complex (evolutionyprotein-protein interactionyfunctional motifysignal transduction) , 1996 .

[21]  R. Dickerson,et al.  How proteins recognize the TATA box. , 1996, Journal of molecular biology.

[22]  D S Latchman,et al.  Transcription-factor mutations and disease. , 1996, The New England journal of medicine.

[23]  T. Kunkel,et al.  Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: alteration of DNA binding specificity through alteration of DNA kinking. , 2001, Journal of molecular biology.

[24]  M Suzuki,et al.  A framework for the DNA-protein recognition code of the probe helix in transcription factors: the chemical and stereochemical rules. , 1994, Structure.

[25]  Helen M. Berman,et al.  Structure of the CAP-DNA complex at 2.5 angstroms resolution: a complete picture of the protein-DNA interface. , 1997, Journal of molecular biology.

[26]  A. Ferré-D’Amaré,et al.  Structure and function of the b/HLH/Z domain of USF , 1994 .

[27]  H M Berman,et al.  Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: DNA binding specificity based on energetics of DNA kinking. , 2001, Journal of molecular biology.

[28]  A Klug,et al.  Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[29]  K. Yamamoto,et al.  Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA , 2003, Nature.

[30]  J H Miller,et al.  Genetic studies of the lac repressor. I. Correlation of mutational sites with specific amino acid residues: construction of a colinear gene-protein map. , 1977, Journal of molecular biology.

[31]  C. Pabo,et al.  Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? , 2000, Journal of molecular biology.

[32]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[33]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[34]  R. Balling,et al.  Waardenburg's syndrome patients have mutations in the human homologue of the Pax-3 paired box gene , 1992, Nature.

[35]  Stephen Neidle,et al.  Protein and drug interactions in the minor groove of DNA. , 2002, Nucleic acids research.

[36]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[37]  R. Eisenman,et al.  Myc and Max Homologs in Drosophila , 1996, Science.

[38]  B. Müller-Hill,et al.  A comparison of the different DNA binding specificities of the bZip proteins C/EBP and GCN4. , 1995, Nucleic acids research.

[39]  W A Hendrickson,et al.  Mechanistic implications from the structure of a catalytic fragment of Moloney murine leukemia virus reverse transcriptase. , 1995, Structure.

[40]  M. Gerstein,et al.  Binding geometry of α‐helices that recognize DNA , 1995, Proteins.

[41]  H. Mizuno,et al.  A mutation study of the DNA binding domain of human papillomavirus type11 E2 protein. , 1997, Journal of biochemistry.

[42]  S. Vashee,et al.  How do "Zn2 cys6" proteins distinguish between similar upstream activation sites? Comparison of the DNA-binding specificity of the GAL4 protein in vitro and in vivo. , 1993, The Journal of biological chemistry.

[43]  C Cruz,et al.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. , 1994, Journal of molecular biology.

[44]  M. Gerstein,et al.  DNA recognition and superstructure formation by helix-turn-helix proteins. , 1995, Protein engineering.

[45]  M Suzuki,et al.  DNA recognition code of transcription factors in the helix-turn-helix, probe helix, hormone receptor, and zinc finger families. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[46]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[47]  J. Schwabe,et al.  Linkers made to measure , 1997, Nature Structural Biology.

[48]  A. Feeney,et al.  Targeted disruption of the PU.1 gene results in multiple hematopoietic abnormalities. , 1996, The EMBO journal.

[49]  A. Koehler,et al.  The role of lysine 55 in determining the specificity of the purine repressor for its operators through minor groove interactions. , 1999, Journal of molecular biology.

[50]  C. Baldwin,et al.  Mutations in the paired domain of the human PAX3 gene cause Klein-Waardenburg syndrome (WS-III) as well as Waardenburg syndrome type I (WS-I). , 1993, American journal of human genetics.

[51]  Steven Hahn,et al.  Crystal structure of a yeast TBP/TATA-box complex , 1993, Nature.

[52]  J. Berg,et al.  Redesigning the DNA‐binding specificity of a zinc finger protein: A data base‐guided approach , 1992, Proteins.

[53]  P B Sigler,et al.  The 2.1-A crystal structure of an archaeal preinitiation complex: TATA-box-binding protein/transcription factor (II)B core/TATA-box. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[54]  M Gerstein,et al.  Stereochemical basis of DNA recognition by Zn fingers. , 1994, Nucleic acids research.

[55]  F E Cohen,et al.  Identification of functional surfaces of the zinc binding domains of intracellular receptors. , 1997, Journal of molecular biology.

[56]  H. Margalit,et al.  Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. , 1995, Journal of molecular biology.

[57]  R. Kodandapani,et al.  A new pattern for helix–turn–helix recognition revealed by the PU.l ETS–domain–DNA complex , 1996, Nature.

[58]  E. Prochownik,et al.  Commonly occurring loss and mutation of the MXI1 gene in prostate cancer , 1998, Genes, chromosomes & cancer.

[59]  N. Grindley Analysis of a nucleoprotein complex: the synaptosome of gamma delta resolvase. , 1993, Science.

[60]  R. Meadows,et al.  Solution structure of the ets domain of Fli-1 when bound to DNA , 1994, Nature Structural Biology.

[61]  C. Hollenberg,et al.  A novel feature of DNA recognition: a mutant Gcn4p bZip peptide with dual DNA binding specificities dependent of half-site spacing. , 1999, Journal of molecular biology.

[62]  Alison L. Cuff,et al.  Integrating mutation data and structural analysis of the TP53 tumor‐suppressor protein , 2002, Human mutation.

[63]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[64]  P. Sigler,et al.  Structural determinants of nuclear receptor assembly on DNA direct repeats , 1995, Nature.

[65]  Fu Lu,et al.  The structure of PurR mutant L54M shows an alternative route to DNA kinking , 1998, Nature Structural Biology.

[66]  P. Sigler,et al.  The basis for half-site specificity explored through a non-cognate steroid receptor-DNA complex , 1995, Nature Structural Biology.

[67]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[68]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[69]  J. Smith,et al.  T-box genes: what they do and how they do it. , 1999, Trends in genetics : TIG.

[70]  J. Liu,et al.  Evidence for a non-alpha-helical DNA-binding motif in the Rel homology region. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[71]  M. Johnston,et al.  Identification of the DNA binding site for NGFI-B by genetic selection in yeast. , 1991, Science.

[72]  A. Joachimiak,et al.  Mutagenesis supports water mediated recognition in the trp repressor‐operator system. , 1994, The EMBO journal.

[73]  P. Jeffrey,et al.  Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. , 1994, Science.

[74]  X. Zhang,et al.  Identification of a contact between arginine-180 of the catabolite gene activator protein (CAP) and base pair 5 of the DNA site in the CAP-DNA complex. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[75]  F E Cohen,et al.  Evolutionarily conserved Galphabetagamma binding surfaces support a model of the G protein-receptor complex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[76]  A. Gunasekera,et al.  DNA sequence determinants for binding of the Escherichia coli catabolite gene activator protein. , 1992, The Journal of biological chemistry.

[77]  A Klug,et al.  Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[78]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[79]  R. Hughes,et al.  Protein‐protein interactions directing resolvase site‐specific recombination: a structure‐function analysis. , 1993, The EMBO journal.

[80]  Concepción Rodríguez-Esteban,et al.  The T-box genes Tbx4 and Tbx5 regulate limb outgrowth and identity , 1999, Nature.

[81]  B. Müller-Hill,et al.  The possible roles of residues 79 and 80 of the Trp repressor from Escherichia coli K-12 in trp operator recognition , 1995, Molecular and General Genetics MGG.

[82]  N. Grishin,et al.  The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences , 1994, Protein science : a publication of the Protein Society.

[83]  R. Balling,et al.  undulated, a mutation affecting the development of the mouse skeleton, has a point mutation in the paired box of Pax 1 , 1988, Cell.

[84]  D. Suck,et al.  DNase I-induced DNA conformation. 2 A structure of a DNase I-octamer complex. , 1991, Journal of molecular biology.

[85]  B. Müller-Hill,et al.  A Lethal Mutant of the Catabolite Gene Activator Protein CAP of Escherichia coli , 1997, Biological chemistry.

[86]  B. DeDecker,et al.  The effects of salt on the TATA binding protein-DNA interaction from a hyperthermophilic archaeon. , 1998, Journal of molecular biology.

[87]  C. Orengo CORA—Topological fingerprints for protein structural families , 2008, Protein science : a publication of the Protein Society.

[88]  J. Berg,et al.  Serine at position 2 in the DNA recognition helix of a Cys2-His2 zinc finger peptide is not, in general, responsible for base recognition. , 1995, Journal of molecular biology.

[89]  A. Gunasekera,et al.  DNA-sequence recognition by CAP: role of the adenine N6 atom of base pair 6 of the DNA site. , 1990, Nucleic acids research.

[90]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[91]  Aaron Klug,et al.  In vivo repression by a site-specific DNA-binding protein designed against an oncogenic sequence , 1994, Nature.

[92]  C. Baldwin,et al.  An exonic mutation in the HuP2 paired domain gene causes Waardenburg's syndrome , 1992, Nature.

[93]  I. Hanson,et al.  Mutations at the PAX6 locus are found in heterogeneous anterior segment malformations including Peters' anomaly , 1994, Nature Genetics.

[94]  J. Milbrandt,et al.  Participation of non-zinc finger residues in DNA binding by two nuclear orphan receptors. , 1992, Science.

[95]  Claude Desplan,et al.  Crystal structure of a paired domain-DNA complex at 2.5 å resolution reveals structural basis for pax developmental mutations , 1995, Cell.

[96]  Stephen K. Burley,et al.  Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain , 1993, Nature.

[97]  Phoebe A Rice,et al.  Crystal Structure of an IHF-DNA Complex: A Protein-Induced DNA U-Turn , 1996, Cell.

[98]  C. Lawson,et al.  Tandem binding in crystals of a trp represser/operator half-site complex , 1993, Nature.

[99]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[100]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[101]  J. Seidman,et al.  Different TBX5 interactions in heart and limb defined by Holt-Oram syndrome mutations. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[102]  David I. Wilson,et al.  Holt-Oram syndrome is caused by mutations in TBX5, a member of the Brachyury (T) gene family , 1997, Nature Genetics.

[103]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[104]  T. Richmond,et al.  The X-ray structure of the GCN4-bZIP bound to ATF/CREB site DNA shows the complex depends on DNA flexibility. , 1993, Journal of molecular biology.

[105]  R. Dickerson,et al.  DNA bending: the prevalence of kinkiness and the virtues of normality. , 1998, Nucleic acids research.

[106]  Peter Gruss,et al.  Pax in development , 1992, Cell.