Conserved elements with potential to form polymorphic G-quadruplex structures in the first intron of human genes

To understand how potential for G-quadruplex formation might influence regulation of gene expression, we examined the 2 kb spanning the transcription start sites (TSS) of the 18 217 human RefSeq genes, distinguishing contributions of template and nontemplate strands. Regions both upstream and downstream of the TSS are G-rich, but the downstream region displays a clear bias toward G-richness on the nontemplate strand. Upstream of the TSS, much of the G-richness and potential for G-quadruplex formation derives from the presence of well-defined canonical regulatory motifs in duplex DNA, including CpG dinucleotides which are sites of regulatory methylation, and motifs recognized by the transcription factor SP1. This challenges the notion that quadruplex formation upstream of the TSS contributes to regulation of gene expression. Downstream of the TSS, G-richness is concentrated in the first intron, and on the nontemplate strand, where polymorphic sequence elements with potential to form G-quadruplex structures and which cannot be accounted for by known regulatory motifs are found in almost 3000 (16%) of the human RefSeq genes, and are conserved through frogs. These elements could in principle be recognized either as DNA or as RNA, providing structural targets for regulation at the level of transcription or RNA processing.

[1]  N. Maizels,et al.  The Bloom’s Syndrome Helicase Unwinds G4 DNA* , 1998, The Journal of Biological Chemistry.

[2]  Shankar Balasubramanian,et al.  G-quadruplexes in promoters throughout the human genome , 2006, Nucleic acids research.

[3]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[4]  G4 DNA unwinding by BLM and Sgs1p: substrate specificity and substrate-specific inhibition. , 2002, Nucleic acids research.

[5]  P. A. Rachwal,et al.  Intramolecular DNA quadruplexes with different arrangements of short and long loops , 2007, Nucleic acids research.

[6]  Julian Leon Huppert,et al.  Four-stranded DNA: cancer, gene regulation and drug development , 2007, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[7]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[8]  Stephen Neidle,et al.  Putative DNA quadruplex formation within the human c-kit oncogene. , 2005, Journal of the American Chemical Society.

[9]  Shankar Balasubramanian,et al.  An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation. , 2007, Nature chemical biology.

[10]  Sarah W. Burge,et al.  Structure of an unprecedented G-quadruplex scaffold in the human c-kit promoter. , 2007, Journal of the American Chemical Society.

[11]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[12]  L. Loeb,et al.  Destabilization of tetraplex structures of the fragile X repeat sequence (CGG)n is mediated by homolog-conserved domains in three members of the hnRNP family. , 2004, Nucleic acids research.

[13]  C. Burd,et al.  RNA binding specificity of hnRNP A1: significance of hnRNP A1 high‐affinity binding sites in pre‐mRNA splicing. , 1994, The EMBO journal.

[14]  D. Bearss,et al.  Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Arcadi Navarro,et al.  Patterns and rates of intron divergence between humans and chimpanzees , 2007, Genome Biology.

[16]  N. Maizels,et al.  High Affinity Interactions of Nucleolin with G-G-paired rDNA* , 1999, The Journal of Biological Chemistry.

[17]  N. Maizels,et al.  Gene function correlates with potential for G4 DNA formation in the human genome , 2006, Nucleic acids research.

[18]  Sarah W. Burge,et al.  Quadruplex DNA: sequence, topology and structure , 2006, Nucleic acids research.

[19]  N. Maizels,et al.  G-rich proto-oncogenes are targeted for genomic instability in B-cell lymphomas. , 2007, Cancer research.

[20]  Yiqiang Zhao,et al.  Extensive selection for the enrichment of G4 DNA motifs in transcriptional regulatory regions of warm blooded animals , 2007, FEBS letters.

[21]  Shankar Balasubramanian,et al.  Prevalence of quadruplexes in the human genome , 2005, Nucleic acids research.

[22]  N. Maizels,et al.  Intracellular transcription of G-rich DNAs induces formation of G-loops, novel structures containing G4 DNA. , 2004, Genes & development.

[23]  Laurence H. Hurley,et al.  Facilitation of a structural transition in the polypurine/polypyrimidine tract within the proximal promoter region of the human VEGF gene by the presence of potassium and G-quadruplex-interactive agents , 2005, Nucleic acids research.

[24]  N. Maizels,et al.  Substrate-specific inhibition of RecQ helicase. , 2001, Nucleic acids research.

[25]  Ning Li,et al.  Enrichment of G4 DNA motif in transcriptional regulatory region of chicken genome. , 2007, Biochemical and biophysical research communications.

[26]  S. Neidle,et al.  Highly prevalent putative quadruplex sequence motifs in human DNA , 2005, Nucleic acids research.

[27]  Jurg Ott,et al.  Distribution and characterization of regulatory elements in the human genome. , 2002, Genome research.

[28]  G. Strathdee,et al.  Control of gene expression by CpG island methylation in normal cells. , 2004, Biochemical Society transactions.

[29]  A. Phan,et al.  DNA architecture: from G to Z. , 2006, Current opinion in structural biology.

[30]  P. Pečinka,et al.  DNA tetraplex formation in the control region of c-myc. , 1998, Nucleic acids research.

[31]  S. Berget,et al.  G triplets located throughout a class of small vertebrate introns enforce intron borders and regulate splice site selection , 1997, Molecular and cellular biology.

[32]  N. Maizels,et al.  A conserved G4 DNA binding domain in RecQ family helicases. , 2006, Journal of molecular biology.

[33]  Jean-Louis Mergny,et al.  Targeting telomeres and telomerase. , 2008, Biochimie.

[34]  L. Hurley,et al.  Secondary DNA structures as molecular targets for cancer therapeutics. , 2001, Biochemical Society transactions.

[35]  Gene W. Yeo,et al.  Variation in sequence and organization of splicing regulatory elements in vertebrate genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[37]  Susan M. Berget,et al.  An Intronic Splicing Enhancer Binds U1 snRNPs To Enhance Splicing and Select 5′ Splice Sites , 2000, Molecular and Cellular Biology.

[38]  M. Groudine,et al.  An Unmethylated 3′ Promoter-Proximal Region Is Required for Efficient Transcription Initiation , 2007, PLoS genetics.

[39]  Phillip J. Wyss,et al.  Exploring the characteristics of sequence elements in proximal promoters of human genes. , 2004, Genomics.

[40]  A. Zahler,et al.  Determination of the RNA Binding Specificity of the Heterogeneous Nuclear Ribonucleoprotein (hnRNP) H/H′/F/2H9 Family* , 2001, The Journal of Biological Chemistry.

[41]  P. A. Rachwal,et al.  Effect of G-tract length on the topology and stability of intramolecular DNA quadruplexes. , 2007, Biochemistry.

[42]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[43]  D. V. Von Hoff,et al.  Drug targeting of the c-MYC promoter to repress gene expression via a G-quadruplex silencer element. , 2006, Seminars in oncology.

[44]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[45]  N. Maizels,et al.  AID binds to transcription-induced structures in c-MYC that map to regions associated with translocation and hypermutation , 2005, Oncogene.

[46]  P Stothard,et al.  The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. , 2000, BioTechniques.

[47]  Markus Wieland,et al.  RNA quadruplex-based modulation of gene expression. , 2007, Chemistry & biology.

[48]  Dinshaw J. Patel,et al.  Human telomere, oncogenic promoter and 5′-UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics , 2007, Nucleic acids research.

[49]  N. Maizels,et al.  MutSα Binds to and Promotes Synapsis of Transcriptionally Activated Immunoglobulin Switch Regions , 2005, Current Biology.

[50]  N. Maizels,et al.  G4 DNA Binding by LR1 and Its Subunits, Nucleolin and hnRNP D, A Role for G-G pairing in Immunoglobulin Switch Recombination* , 1999, The Journal of Biological Chemistry.

[51]  G. Parkinson,et al.  Sequence occurrence and structural uniqueness of a G-quadruplex in the human c-kit promoter , 2007, Nucleic acids research.

[52]  N. Maizels,et al.  Dynamic roles for G4 DNA in the biology of eukaryotic cells , 2006, Nature Structural &Molecular Biology.