Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes

Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function.

[1]  David A. Lee,et al.  PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.

[2]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[3]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2013 , 2012, Nucleic Acids Res..

[4]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[5]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[6]  András Fiser,et al.  The Underlying Molecular and Network Level Mechanisms in the Evolution of Robustness in Gene Regulatory Networks , 2013, PLoS Comput. Biol..

[7]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[8]  B. Nadel,et al.  TLX homeodomain oncogenes mediate T cell maturation arrest in T-ALL via interaction with ETS1 and suppression of TCRα gene expression. , 2012, Cancer cell.

[9]  Mark Gerstein,et al.  TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles , 2011, Bioinform..

[10]  Julio Collado-Vides,et al.  Prediction of TF target sites based on atomistic models of protein-DNA complexes , 2008, BMC Bioinformatics.

[11]  G. Stalla,et al.  Somatostatin receptors: From signaling to clinical practice , 2013, Frontiers in Neuroendocrinology.

[12]  J. Satoh,et al.  A Comprehensive Profile of ChIP-Seq-Based STAT1 Target Genes Suggests the Complexity of STAT1-Mediated Gene Regulatory Mechanisms , 2013, Gene regulation and systems biology.

[13]  Tsukasa Okuda,et al.  RUNX1/AML1: A Central Player in Hematopoiesis , 2001, International journal of hematology.

[14]  George Karypis,et al.  Computational tools for protein–DNA interactions , 2012, WIREs Data Mining Knowl. Discov..

[15]  Michael Q. Zhang,et al.  Similarity of position frequency matrices for transcription factor binding sites , 2005, Bioinform..

[16]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[17]  E. Siggia,et al.  Connecting protein structure with predictions of regulatory sites , 2007, Proceedings of the National Academy of Sciences.

[18]  A. Bergman,et al.  The limits of subfunctionalization , 2007, BMC Evolutionary Biology.

[19]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[20]  Alexander van Oudenaarden,et al.  Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins , 2013, Proceedings of the National Academy of Sciences.

[21]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[22]  Lin Yang,et al.  TFBSshape: a motif database for DNA shape features of transcription factor binding sites , 2013, Nucleic Acids Res..

[23]  R. Heilig,et al.  A new recurrent and specific cryptic translocation, t(5;14)(q35;q32), is associated with expression of the Hox11L2 gene in T acute lymphoblastic leukemia , 2001, Leukemia.

[24]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[25]  Ying Xu,et al.  Structure‐based prediction of transcription factor binding sites using a protein‐DNA docking approach , 2008, Proteins.

[26]  Andrea Califano,et al.  Reverse engineering of TLX oncogenic transcriptional networks identifies RUNX1 as tumor suppressor in T-ALL , 2011, Nature Medicine.

[27]  András Fiser,et al.  M4T: a comparative protein structure modeling server , 2007, Nucleic Acids Res..

[28]  S. Quake,et al.  A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors , 2007, Science.

[29]  Alexander E. Kel,et al.  Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies , 2010, BMC Bioinformatics.

[30]  C. S. Millard,et al.  A family of LIC vectors for high-throughput cloning and purification of proteins. , 2009, Methods in molecular biology.

[31]  J. Collado-Vides,et al.  The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. , 2000, Nucleic acids research.

[32]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[33]  Philip Bradley,et al.  Atomistic modeling of protein-DNA interaction specificity: progress and applications. , 2012, Current opinion in structural biology.

[34]  Xin Wang,et al.  Schmeissneria: A missing link to angiosperms? , 2007, BMC Evolutionary Biology.

[35]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[36]  Ole Winther,et al.  JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update , 2007, Nucleic Acids Res..

[37]  Daniel Quest,et al.  The Motif Tool Assessment Platform (MTAP) for sequence-based transcription factor binding site prediction tools. , 2010, Methods in molecular biology.

[38]  Sarah A. Teichmann,et al.  FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database , 2009, Nucleic Acids Res..

[39]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[40]  Tarun Jain,et al.  The role of water in protein-DNA recognition. , 2004, Annual review of biophysics and biomolecular structure.

[41]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[42]  D. Baker,et al.  A simple physical model for the prediction and design of protein-DNA interactions. , 2004, Journal of molecular biology.

[43]  Vladimir B. Bajic,et al.  HOCOMOCO: a comprehensive collection of human transcription factor binding sites models , 2012, Nucleic Acids Res..

[44]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[45]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[46]  F. Studier,et al.  Protein production by auto-induction in high density shaking cultures. , 2005, Protein expression and purification.

[47]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[48]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[49]  Christian A. Grove,et al.  A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks , 2005, Genome Biology.

[50]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[51]  Andras Fiser,et al.  Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative , 2014, Proceedings of the National Academy of Sciences.

[52]  Gaetano T. Montelione,et al.  The Protein Structure Initiative: achievements and visions for the future , 2012, F1000 biology reports.

[53]  M. Bulyk,et al.  Using protein design algorithms to understand the molecular basis of disease caused by protein–DNA interactions: the Pax6 example , 2010, Nucleic acids research.

[54]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[55]  Cláudia Lopes,et al.  Tlx3 and Runx1 Act in Combination to Coordinate the Development of a Cohort of Nociceptors, Thermoceptors, and Pruriceptors , 2012, The Journal of Neuroscience.

[56]  S. Smale,et al.  Combinatorial regulation of transcription. I: General aspects of transcriptional control. , 1995, Immunity.

[57]  Debra L. Fulton,et al.  TFCat: the curated catalog of mouse and human transcription factors , 2009, Genome Biology.

[58]  Julio Collado-Vides,et al.  RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units) , 2010, Nucleic Acids Res..

[59]  M. Frank-Kamenetskii,et al.  Two sides of the coin: affinity and specificity of nucleic acid interactions. , 2004, Trends in biochemical sciences.

[60]  Michael Q. Zhang,et al.  Identifying tissue-selective transcription factor binding sites in vertebrate promoters. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[61]  G. Stormo,et al.  Determining the specificity of protein–DNA interactions , 2010, Nature Reviews Genetics.

[62]  B. Wang,et al.  Tlx3 Controls Cholinergic Transmitter and Peptide Phenotypes in a Subset of Prenatal Sympathetic Neurons , 2013, The Journal of Neuroscience.

[63]  R. Sauer,et al.  Transcription factors: structural families and principles of DNA recognition. , 1992, Annual review of biochemistry.

[64]  M. Tateno,et al.  A novel ab initio identification system of transcriptional regulation motifs in genome DNA sequences based on direct comparison scheme of signal/noise distributions , 2012, Nucleic acids research.

[65]  Gabriele Varani,et al.  An all‐atom, distance‐dependent scoring function for the prediction of protein–DNA interactions from structure , 2006, Proteins.

[66]  E. Hashino,et al.  Wnt Signaling Promotes Neuronal Differentiation from Mesenchymal Stem Cells Through Activation of Tlx3 , 2011, Stem cells.

[67]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[68]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[69]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[70]  J. Satoh,et al.  Pathway Analysis of ChIP-Seq-Based NRF1 Target Genes Suggests a Logical Hypothesis of their Involvement in the Pathogenesis of Neurodegenerative Diseases , 2013, Gene regulation and systems biology.

[71]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[72]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[73]  I. Ovcharenko,et al.  Identifying regulatory elements in eukaryotic genomes. , 2009, Briefings in functional genomics & proteomics.