Intra-Domain Residue Coevolution in Transcription Factors Contributes to DNA Binding Specificity

While unraveling DNA-binding specificity of TFs is the key to understanding the basis and molecular mechanism of gene expression regulation, identifying the critical determinants that contribute to DNA binding specificity remains a major challenge. In this study, we provided evidence showing that coevolving residues in TF domains contributed to DNA binding specificity. ABSTRACT Understanding the basis of the DNA-binding specificity of transcription factors (TFs) has been of long-standing interest. Despite extensive efforts to map millions of putative TF binding sequences, identifying the critical determinants for DNA binding specificity remains a major challenge. The coevolution of residues in proteins occurs due to a shared evolutionary history. However, it is unclear how coevolving residues in TFs contribute to DNA binding specificity. Here, we systematically collected publicly available data sets from multiple large-scale high-throughput TF–DNA interaction screening experiments for the major TF families with large numbers of TF members. These families included the Homeobox, HLH, bZIP_1, Ets, HMG_box, ZF-C4, and Zn_clus TFs. We detected TF subclass-determining sites (TSDSs) and showed that the TSDSs were more likely to coevolve with other TSDSs than with non-TSDSs, particularly for the Homeobox, HLH, Ets, bZIP_1, and HMG_box TF families. By in silico modeling, we showed that mutation of the highly coevolving residues could significantly reduce the stability of the TF–DNA complex. The distant residues from the DNA interface also contributed to TF–DNA binding activity. Overall, our study gave evidence that coevolved residues relate to transcriptional regulation and provided insights into the potential application of engineered DNA-binding domains and proteins. IMPORTANCE While unraveling DNA-binding specificity of TFs is the key to understanding the basis and molecular mechanism of gene expression regulation, identifying the critical determinants that contribute to DNA binding specificity remains a major challenge. In this study, we provided evidence showing that coevolving residues in TF domains contributed to DNA binding specificity. We demonstrated that the TSDSs were more likely to coevolve with other TSDSs than with non-TSDSs. Mutation of the coevolving residue pairs (CRPs) could significantly reduce the stability of THE TF–DNA complex, and even the distant residues from the DNA interface contribute to TF–DNA binding activity. Collectively, our study expands our knowledge of the interactions among coevolved residues in TFs, tertiary contacting, and functional importance in refined transcriptional regulation. Understanding the impact of coevolving residues in TFs will help understand the details of transcription of gene regulation and advance the application of engineered DNA-binding domains and protein.

[1]  Yihan Lin,et al.  Modulating gene regulation function by chemically controlled transcription factor clustering , 2022, Nature Communications.

[2]  A. Nadra,et al.  Transcription factor specificity limits the number of DNA-binding motifs , 2022, PloS one.

[3]  Rafael Riudavets Puig,et al.  JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles , 2021, Nucleic Acids Res..

[4]  Silvio C. E. Tosatto,et al.  Pfam: The protein families database in 2021 , 2020, Nucleic Acids Res..

[5]  OUP accepted manuscript , 2021, Nucleic Acids Research.

[6]  D. Mokhtari,et al.  High-Throughput Affinity Measurements of Transcription Factor and DNA Mutations Reveal Affinity and Specificity Determinants. , 2020, Cell systems.

[7]  U. Gaul,et al.  Transcription Factor Binding Affinities and DNA Shape Readout , 2020, iScience.

[8]  A. Aksimentiev,et al.  Molecular dynamics simulations of DNA-DNA and DNA-protein interactions. , 2020, Current opinion in structural biology.

[9]  Yan Wang,et al.  Coevolution-based prediction of protein-protein interactions in polyketide biosynthetic assembly lines. , 2020, Bioinformatics.

[10]  R. Mann,et al.  Context-Dependent Gene Regulation by Homeodomain Transcription Factor Complexes Revealed by Shape-Readout Deficient Proteins. , 2020, Molecular cell.

[11]  Judith B. Zaugg,et al.  Mechanistic insights into transcription factor cooperativity and its impact on protein-phenotype interactions , 2020, Nature Communications.

[12]  R. Mann,et al.  Towards a mechanistic understanding of DNA methylation readout by transcription factors. , 2020, Journal of molecular biology.

[13]  Neera Borkakoti,et al.  A global analysis of function and conservation of catalytic residues in enzymes , 2019, The Journal of Biological Chemistry.

[14]  M. Weigt,et al.  Structures of a dimodular nonribosomal peptide synthetase reveal conformational flexibility , 2019, Science.

[15]  G. Gloor,et al.  Modifying a covarying protein–DNA interaction changes substrate preference of a site-specific endonuclease , 2019, Nucleic acids research.

[16]  A. Singh,et al.  Structural basis of specific DNA binding by the transcription factor ZBTB24 , 2019, Nucleic acids research.

[17]  Yan Wang,et al.  Coevolution-based prediction of protein-protein interactions in polyketide biosynthetic assembly lines , 2019, bioRxiv.

[18]  Sandy L. Klemm,et al.  Chromatin accessibility and the regulatory epigenome , 2019, Nature Reviews Genetics.

[19]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[20]  Hui Hu,et al.  AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors , 2018, Nucleic Acids Res..

[21]  Beatriz Marcotegui Residues , 2018, Complex Analysis.

[22]  Shina Caroline Lynn Kamerlin,et al.  Conformational dynamics and enzyme evolution , 2018, Journal of The Royal Society Interface.

[23]  Cristina Marino Buslje,et al.  MISTIC2: comprehensive server to study coevolution in protein families , 2018, Nucleic Acids Res..

[24]  T. Hughes,et al.  The Human Transcription Factors , 2018, Cell.

[25]  A. Nadra,et al.  Core promoter information content correlates with optimal growth temperature , 2018, Scientific Reports.

[26]  Lihua Julie Zhu,et al.  motifStack for the analysis of transcription factor binding site evolution , 2018, Nature Methods.

[27]  F. A. Kolpakov,et al.  HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis , 2017, Nucleic Acids Res..

[28]  Polly M. Fordyce,et al.  Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding , 2017, Proceedings of the National Academy of Sciences.

[29]  P. Cramer,et al.  The interaction landscape between transcription factors and the nucleosome , 2017, Nature.

[30]  David Baker,et al.  Origins of coevolution between residues distant in protein 3D structures , 2017, Proceedings of the National Academy of Sciences.

[31]  R. Métivier,et al.  Cytosine modifications modulate the chromatin architecture of transcriptional enhancers. , 2017, Genome research.

[32]  M. Bulyk,et al.  Transcription factor-DNA binding: beyond binding site motifs. , 2017, Current opinion in genetics & development.

[33]  Thomas A. Hopf,et al.  Structured States of Disordered Proteins from Genomic Sequences , 2016, Cell.

[34]  Modesto Orozco,et al.  Residues Coevolution Guides the Systematic Identification of Alternative Functional Conformations in Proteins. , 2016, Structure.

[35]  A. Valencia,et al.  From residue coevolution to protein conformational ensembles and functional dynamics , 2015, Proceedings of the National Academy of Sciences.

[36]  Ivet Bahar,et al.  Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution , 2015, Bioinform..

[37]  Martha L. Bulyk,et al.  UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions , 2014, Nucleic Acids Res..

[38]  Ying Liu,et al.  Evol and ProDy for bridging protein sequence evolution and structural dynamics , 2014, Bioinform..

[39]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[40]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[41]  M. Bulyk,et al.  Diversification of transcription factor paralogs via noncanonical modularity in C2H2 zinc finger DNA binding. , 2014, Molecular cell.

[42]  R. Veitia,et al.  Transcription factors: specific DNA binding and specific gene regulation. , 2014, Trends in genetics : TIG.

[43]  M. Bulyk,et al.  Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. , 2013, Cell reports.

[44]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[45]  M. Churchill,et al.  The high mobility group box: the ultimate utility player of a cell. , 2012, Trends in biochemical sciences.

[46]  A. Carbone,et al.  Protein Fragments: Functional and Structural Roles of Their Coevolution Networks , 2012, PloS one.

[47]  Kwong-Sak Leung,et al.  Subtypes of associated protein–DNA (Transcription Factor-Transcription Factor Binding Site) patterns , 2012, Nucleic acids research.

[48]  Anna R. Panchenko,et al.  SPEER-SERVER: a web server for prediction of protein specificity determining sites , 2012, Nucleic Acids Res..

[49]  R. Mann,et al.  Cofactor Binding Evokes Latent Differences in DNA Binding Specificity between Hox Proteins , 2011, Cell.

[50]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[51]  Shu Yang,et al.  Correlated evolution of transcription factors and their binding sites , 2011, Bioinform..

[52]  Martha L. Bulyk,et al.  Using a structural and logics systems approach to infer bHLH–DNA binding specificity determinants , 2011, Nucleic acids research.

[53]  Gordon Robertson,et al.  An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq , 2011, PloS one.

[54]  Greg W. Clark,et al.  Using coevolution to predict protein-protein interactions. , 2011, Methods in molecular biology.

[55]  Thomas D. Schneider,et al.  A brief review of molecular information theory , 2010, Nano Commun. Networks.

[56]  Juan M. Vaquerizas,et al.  Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. , 2010, Genome research.

[57]  Andrew R. Gehrke,et al.  Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo , 2010, The EMBO journal.

[58]  Anna R. Panchenko,et al.  Structural and Functional Roles of Coevolved Sites in Proteins , 2010, PloS one.

[59]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[60]  Daniel E. Newburger,et al.  A Multiparameter Network Reveals Extensive Divergence between C. elegans bHLH Transcription Factors , 2009, Cell.

[61]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[62]  Anna R Panchenko,et al.  Coevolution in defining the functional specificity , 2009, Proteins.

[63]  Daniel E. Newburger,et al.  Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences , 2008, Cell.

[64]  G. Stormo,et al.  Analysis of Homeodomain Specificities Allows the Family-wide Prediction of Preferred Recognition Sites , 2008, Cell.

[65]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[66]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[67]  M. Brodsky,et al.  A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors , 2005, Nature Biotechnology.

[68]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[69]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[70]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[71]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[73]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[74]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[75]  Toshio Hakoshima,et al.  Structural basis for the diversity of DNA recognition by bZIP transcription factors , 2000, Nature Structural Biology.

[76]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[77]  M. Featherstone,et al.  Distinct HOX N-terminal Arm Residues Are Responsible for Specificity of DNA Recognition by HOX Monomers and HOX·PBX Heterodimers* , 1997, The Journal of Biological Chemistry.

[78]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[79]  Pierre Gönczy,et al.  A single amino acid can determine the DNA binding specificity of homeodomain proteins , 1989, Cell.