Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships

Transcriptional regulation is mediated by the collective binding of proteins called transcription factors to cis-regulatory elements. A handful of factors are known to function at particular distances from the transcription start site, although the extent to which this occurs is not well understood. Spatial dependencies can also exist between pairs of binding motifs, facilitating factor-pair interactions. We sought to determine to what extent spatial preferences measured at high-scale resolution could be utilized to predict cis-regulatory elements as well as motif-pairs binding interacting proteins. We introduce the ‘motif positional function’ model which predicts spatial biases using regression analysis, differentiating noise from true position-specific overrepresentation at single-nucleotide resolution. Our method predicts 48 consensus motifs exhibiting positional enrichment within human promoters, including fourteen motifs without known binding partners. We then extend the model to analyze distance preferences between pairs of motifs. We find that motif-pairs binding interacting factors often co-occur preferentially at multiple distances, with intervals between preferred distances often corresponding to the turn of the DNA double-helix. This offers a novel means by which to predict sequence elements with a collective role in gene regulation.

[1]  F. Nunes,et al.  Homeobox genes: a molecular link between development and cancer. , 2003, Pesquisa odontologica brasileira = Brazilian oral research.

[2]  D. Cavener,et al.  Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. , 1987, Nucleic acids research.

[3]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[4]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[5]  C. Vinson,et al.  Clustering of DNA sequences in human promoters. , 2004, Genome research.

[6]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[7]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[8]  S Karlin,et al.  Statistical analyses of counts and distributions of restriction sites in DNA sequences. , 1992, Nucleic acids research.

[9]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[10]  A. Sharrocks,et al.  MADS-box transcription factors adopt alternative mechanisms for bending DNA. , 1999, Journal of molecular biology.

[11]  T. Kerppola Transcriptional cooperativity: bending over backwards and doing the flip. , 1998, Structure.

[12]  L. Naylor,et al.  d(TG)n.d(CA)n sequences upstream of the rat prolactin gene form Z-DNA and inhibit gene transcription. , 1990, Nucleic acids research.

[13]  H. Rozenberg,et al.  DNA bending by an adenine–thymine tract and its role in gene regulation , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  David Landsman,et al.  Alignments anchored on genomic landmarks can aid in the identification of regulatory elements , 2005, ISMB.

[15]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[16]  M. Q. Zhang,et al.  Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Vaccari,et al.  The expression of the human neuronal alpha3 Na+,K+-ATPase subunit gene is regulated by the activity of the Sp1 and NF-Y transcription factors. , 2005, The Biochemical journal.

[18]  R. Myers,et al.  Serum response factor binding sites differ in three human cell types. , 2007, Genome research.

[19]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[20]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[21]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[22]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[23]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[24]  Kazuya Yamada,et al.  Identification of proteins that interact with NF‐YA , 1999, FEBS letters.

[25]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[26]  R. Conaway,et al.  Role of core promoter structure in assembly of the RNA polymerase II preinitiation complex. A common pathway for formation of preinitiation intermediates at many TATA and TATA-less promoters. , 1994, The Journal of biological chemistry.

[27]  R. Harrington DNA curving and bending in protein–DNA recognition , 1992, Molecular microbiology.

[28]  G M Clore,et al.  Solution structure of the MEF2A–DNA complex: structural basis for the modulation of DNA bending and specificity by MADS‐box transcription factors , 2000, The EMBO journal.

[29]  E. Whitelaw The role of DNA-binding proteins in differentiation and transformation. , 1989, Journal of cell science.

[30]  Tim J. P. Hubbard,et al.  Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster , 2006, PLoS Comput. Biol..

[31]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[32]  R. Roeder,et al.  TATA‐binding protein‐associated factor(s) in TFIID function through the initiator to direct basal transcription from a TATA‐less class II promoter. , 1994, The EMBO journal.

[33]  Eugene Bolotin,et al.  Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters. , 2007, Gene.

[34]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[35]  Mark J. van der Laan,et al.  Regulatory motif finding by logic regression , 2004, Bioinform..

[36]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[37]  Robert J. White,et al.  Gene Transcription: Mechanisms and Control , 2001 .

[38]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[39]  Michael B. Eisen,et al.  Identification of regulatory elements using a feature selection method , 2002, Bioinform..

[40]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[41]  S. Smale,et al.  Direct recognition of initiator elements by a component of the transcription factor IID complex. , 1994, Genes & development.

[42]  Roded Sharan,et al.  A discriminative model for identifying spatial cis-regulatory modules , 2004, J. Comput. Biol..

[43]  Ponraj Prabakaran,et al.  Classification of protein-DNA complexes based on structural descriptors. , 2006, Structure.

[44]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..

[45]  R. Mantovani,et al.  NF-Y binding to twin CCAAT boxes: role of Q-rich domains and histone fold helices. , 1999, Journal of molecular biology.

[46]  R. Nussinov,et al.  p53-Induced DNA bending: the interplay between p53-DNA and p53-p53 interactions. , 2008, The journal of physical chemistry. B.

[47]  D M Crothers,et al.  Intrinsically bent DNA. , 1990, The Journal of biological chemistry.

[48]  T. Sturgill,et al.  ERK2- and p90Rsk2-dependent Pathways Regulate the CCAAT/Enhancer-binding Protein-β Interaction with Serum Response Factor* , 2001, The Journal of Biological Chemistry.

[49]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[50]  A. Orr-Urtreger,et al.  Homeogenes in mammalian development and the evolution of the cranium and central nervous system , 1990, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[51]  Zhiping Weng,et al.  Analysis of overrepresented motifs in human core promoters reveals dual regulatory roles of YY1. , 2007, Genome research.

[52]  T. Eulgem Eukaryotic transcription factors , 2001, Genome Biology.

[53]  E. Segal,et al.  Poly(da:dt) Tracts: Major Determinants of Nucleosome Organization This Review Comes from a Themed Issue on Protein-nucleic Acid Interactions Edited , 2022 .

[54]  A. Rich,et al.  A polymorphic dinucleotide repeat in the rat nucleolin gene forms Z-DNA and inhibits promoter activity , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[56]  Mahadeb Pal,et al.  The role of the transcription bubble and TFIIB in promoter clearance by RNA polymerase II. , 2005, Molecular cell.

[57]  J. Douglas Faires,et al.  Numerical Analysis , 1981 .

[58]  Irene K. Moore,et al.  The DNA-encoded nucleosome organization of a eukaryotic genome , 2009, Nature.

[59]  A. Harel-Bellan,et al.  Physical Interaction between the Mitogen-responsive Serum Response Factor and Myogenic Basic-Helix-Loop-Helix Proteins (*) , 1996, The Journal of Biological Chemistry.

[60]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[61]  A. Rich,et al.  The chemistry and biology of left-handed Z-DNA. , 1984, Annual review of biochemistry.

[62]  Olivier Bodenreider,et al.  The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site , 2008, Nucleic acids research.

[63]  S. Hannenhalli,et al.  Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation , 2007, Nucleic acids research.

[64]  M. Levine,et al.  Immunity regulatory DNAs share common organizational features in Drosophila. , 2004, Molecular cell.

[65]  R. Sharan,et al.  Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. , 2003, Genome research.

[66]  M. Gilman,et al.  YY1 facilitates the association of serum response factor with the c-fos serum response element , 1995, Molecular and cellular biology.

[67]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[68]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[69]  W. McGinnis,et al.  Regulation of segmentation and segmental identity by Drosophila homeoproteins: the role of DNA binding in functional activity and specificity. , 1997, Development.

[70]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[71]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[72]  K. Fujiwara,et al.  Serum response factor: master regulator of the actin cytoskeleton and contractile apparatus. , 2007, American journal of physiology. Cell physiology.

[73]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[74]  S. Levy,et al.  Predicting transcription factor synergism. , 2002, Nucleic acids research.

[75]  A. Tarnawski,et al.  Serum response factor: discovery, biochemistry, biological roles and implications for tissue injury healing. , 2002, Journal of physiology and pharmacology : an official journal of the Polish Physiological Society.

[76]  R. Prywes,et al.  Interaction of ATF6 and serum response factor , 1997, Molecular and cellular biology.

[77]  Sridhar Hannenhalli,et al.  A mammalian promoter model links cis elements to genetic networks. , 2006, Biochemical and biophysical research communications.

[78]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[79]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[80]  E. Olson,et al.  MEF2: a central regulator of diverse developmental programs , 2007, Development.

[81]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[82]  H M Berman,et al.  Protein-DNA interactions: A structural analysis. , 1999, Journal of molecular biology.

[83]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[84]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[85]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[86]  Wei Zhang,et al.  A mixture model-based discriminate analysis for identifying ordered transcription factor binding site pairs in gene promoters directly regulated by estrogen receptor-alpha , 2006, Bioinform..

[87]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[88]  Edward J. Oakeley,et al.  Computational Structural Analysis: Multiple Proteins Bound to DNA , 2008, PloS one.

[89]  R. Schwartz,et al.  Recruitment of the tinman homolog Nkx-2.5 by serum response factor activates cardiac alpha-actin gene transcription , 1996, Molecular and cellular biology.

[90]  Xiao-Tu Ma,et al.  Predicting polymerase II core promoters by cooperating transcription factor binding sites in eukaryotic genes. , 2004, Acta biochimica et biophysica Sinica.

[91]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[92]  A. Sandelin,et al.  Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters , 2006, Genome Biology.

[93]  David Sturgill,et al.  Comparative genomics of Drosophila and human core promoters , 2006, Genome Biology.