Re-evaluation of G-quadruplex propensity with G4Hunter

Critical evidence for the biological relevance of G-quadruplexes (G4) has recently been obtained in seminal studies performed in a variety of organisms. Four-stranded G-quadruplex DNA structures are promising drug targets as these non-canonical structures appear to be involved in a number of key biological processes. Given the growing interest for G4, accurate tools to predict G-quadruplex propensity of a given DNA or RNA sequence are needed. Several algorithms such as Quadparser predict quadruplex forming propensity. However, a number of studies have established that sequences that are not detected by these tools do form G4 structures (false negatives) and that other sequences predicted to form G4 structures do not (false positives). Here we report development and testing of a radically different algorithm, G4Hunter that takes into account G-richness and G-skewness of a given sequence and gives a quadruplex propensity score as output. To validate this model, we tested it on a large dataset of 392 published sequences and experimentally evaluated quadruplex forming potential of 209 sequences using a combination of biophysical methods to assess quadruplex formation in vitro. We experimentally validated the G4Hunter algorithm on a short complete genome, that of the human mitochondria (16.6 kb), because of its relatively high GC content and GC skewness as well as the biological relevance of these quadruplexes near instability hotspots. We then applied the algorithm to genomes of a number of species, including humans, allowing us to conclude that the number of sequences capable of forming stable quadruplexes (at least in vitro) in the human genome is significantly higher, by a factor of 2–10, than previously thought.

[1]  Katrin Paeschke,et al.  DNA Replication through G-Quadruplex Motifs Is Promoted by the Saccharomyces cerevisiae Pif1 DNA Helicase , 2011, Cell.

[2]  Jean-Louis Mergny,et al.  Thioflavin T as a fluorescence light-up probe for G4 formation , 2014, Nucleic acids research.

[3]  H. Day,et al.  i-Motif DNA: structure, stability and targeting with ligands. , 2014, Bioorganic & medicinal chemistry.

[4]  Stephen P. Jackson,et al.  Small molecule-induced DNA damage identifies alternative DNA structures in human genes , 2012, Nature chemical biology.

[5]  A. Phan,et al.  Bulges in G-quadruplexes: broadening the definition of G-quadruplex-forming sequences. , 2013, Journal of the American Chemical Society.

[6]  A. Nicolas,et al.  G‐quadruplex‐induced instability during leading‐strand replication , 2011, The EMBO journal.

[7]  J. Mergny,et al.  Kinetics and thermodynamics of i-DNA formation: phosphodiester versus modified oligodeoxynucleotides. , 1998, Nucleic acids research.

[8]  F. Johnson,et al.  Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae , 2007, Nucleic acids research.

[9]  D. Bearss,et al.  Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  G. Smith,et al.  High-throughput sequencing of DNA G-quadruplex structures in the human genome , 2015, Nature Biotechnology.

[11]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[12]  Jean-Louis Mergny,et al.  Targeting telomeres and telomerase. , 2008, Biochimie.

[13]  S. Neidle,et al.  Highly prevalent putative quadruplex sequence motifs in human DNA , 2005, Nucleic acids research.

[14]  J. Mergny,et al.  DNA Sequences Proximal to Human Mitochondrial DNA Deletion Breakpoints Prevalent in Human Disease Form G-quadruplexes, a Class of DNA Structures Inefficiently Unwound by the Mitochondrial Replicative Twinkle Helicase* , 2014, The Journal of Biological Chemistry.

[15]  A. Serero,et al.  Short loop length and high thermal stability determine genomic instability induced by G‐quadruplex‐forming minisatellites , 2015, The EMBO journal.

[16]  Remo Rohs,et al.  High-resolution profiling of Drosophila replication start sites reveals a DNA shape and chromatin signature of metazoan origins. , 2015, Cell reports.

[17]  R. Tauler,et al.  Solution equilibria of the i-motif-forming region upstream of the B-cell lymphoma-2 P1 promoter. , 2007, Biochimie.

[18]  Iris Cheung,et al.  Disruption of dog-1 in Caenorhabditis elegans triggers deletions upstream of guanine-rich DNA , 2002, Nature Genetics.

[19]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[20]  C. Gustafsson,et al.  G-quadruplex structures in RNA stimulate mitochondrial transcription termination and primer formation , 2010, Proceedings of the National Academy of Sciences.

[21]  A. Amorim,et al.  Mitochondrial DNA deletions are associated with non-B DNA conformations , 2012, Nucleic acids research.

[22]  J. Chaires,et al.  Thermal difference spectra: a specific signature for nucleic acid structures , 2005, Nucleic acids research.

[23]  Yu-hua Hao,et al.  Formation of DNA:RNA hybrid G-quadruplex in bacterial cells and its dominance over the intramolecular DNA G-quadruplex in mediating transcription termination. , 2015, Angewandte Chemie.

[24]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[25]  Noam Kaplan,et al.  New insights into replication origin characteristics in metazoans , 2012, Cell cycle.

[26]  F. Sanger,et al.  Sequence and organization of the human mitochondrial genome , 1981, Nature.

[27]  F. Johnson,et al.  Association of G-quadruplex forming sequences with human mtDNA deletion breakpoints , 2014, BMC Genomics.

[28]  Yiqiang Zhao,et al.  Genome-wide analysis reveals regulatory role of G4 DNA in gene transcription. , 2008, Genome research.

[29]  Jean-Louis Mergny,et al.  Following G‐quartet formation by UV‐spectroscopy , 1998, FEBS letters.

[30]  Michal Zimmermann,et al.  TRF1 negotiates TTAGGG repeat-associated replication problems by recruiting the BLM helicase and the TPP1/POT1 repressor of ATR signaling , 2014, Genes & development.

[31]  Geoffrey D. Brown,et al.  I-motif formation in gene promoters: unusually stable formation in sequences complementary to known G-quadruplexes. , 2012, Chemical communications.

[32]  F. Crick,et al.  Genetical Implications of the Structure of Deoxyribonucleic Acid , 1953, Nature.

[33]  J. Mergny,et al.  Sequence effects in single-base loops for quadruplexes. , 2008, Biochimie.

[34]  Shankar Balasubramanian,et al.  Prevalence of quadruplexes in the human genome , 2005, Nucleic acids research.

[35]  J. Mergny,et al.  Stability of intramolecular quadruplexes: sequence effects in the central loop , 2009, Nucleic acids research.

[36]  Shankar Balasubramanian,et al.  G-quadruplexes in promoters throughout the human genome , 2006, Nucleic acids research.

[37]  Amy Lin,et al.  Anticancer activity of CX-3543: a direct inhibitor of rRNA biogenesis. , 2009, Cancer research.

[38]  Optimizing the Kinetics and Thermodynamics of DNA i‐Motif Folding , 2013, Chembiochem : a European journal of chemical biology.

[39]  P. Ryvkin,et al.  Computational detection and analysis of sequences with duplex-derived interstrand G-quadruplex forming potential. , 2012, Methods.

[40]  Mona Singh,et al.  G-Quadruplex DNA Sequences Are Evolutionarily Conserved and Associated with Distinct Genomic Features in Saccharomyces cerevisiae , 2010, PLoS Comput. Biol..

[41]  Jean-Louis Mergny,et al.  Topology of a DNA G-quadruplex structure formed in the HIV-1 promoter: a potential target for anti-HIV drug development. , 2014, Journal of the American Chemical Society.

[42]  J. Leroy,et al.  Intramolecular Folding of Pyrimidine Oligodeoxynucleotides into an i-DNA Motif , 1995 .

[43]  Jean-Louis Mergny,et al.  How long is too long? Effects of loop size on G-quadruplex stability , 2010, Nucleic acids research.

[44]  H. Moine,et al.  G‐quadruplexes in RNA biology , 2012, Wiley interdisciplinary reviews. RNA.

[45]  L. Hurley,et al.  Making sense of G‐quadruplex and i‐motif functions in oncogene promoters , 2010, The FEBS journal.

[46]  N. Maizels,et al.  Selection for the G4 DNA motif at the 5′ end of human genes , 2009, Molecular carcinogenesis.

[47]  H. Leffers,et al.  Identification of two human nuclear proteins that recognise the cytosine-rich strand of human telomeres in vitro. , 2000, Nucleic acids research.

[48]  J. Mergny,et al.  UV Melting of G‐Quadruplexes , 2009, Current protocols in nucleic acid chemistry.

[49]  Jean-Michel Marin,et al.  Unraveling cell type–specific and reprogrammable human replication origin signatures associated with G-quadruplex consensus motifs , 2012, Nature Structural &Molecular Biology.

[50]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[51]  R. Hoffmann,et al.  Guanine quadruplex structures localize to heterochromatin , 2015, Nucleic acids research.

[52]  Markus Wieland,et al.  RNA quadruplex-based modulation of gene expression. , 2007, Chemistry & biology.

[53]  Julian Leon Huppert,et al.  G-quadruplexes: the beginning and end of UTRs , 2008, Nucleic acids research.

[54]  Oliver Stegle,et al.  Predicting and understanding the stability of G-quadruplexes , 2009, Bioinform..

[55]  Patrizia Alberti,et al.  G4 motifs affect origin positioning and efficiency in two vertebrate replicators , 2014, The EMBO journal.

[56]  N. Maizels,et al.  Gene function correlates with potential for G4 DNA formation in the human genome , 2006, Nucleic acids research.

[57]  N. Maizels,et al.  Conserved elements with potential to form polymorphic G-quadruplex structures in the first intron of human genes , 2008, Nucleic acids research.

[58]  N. Sugimoto,et al.  Loop nucleotides impact the stability of intrastrand i-motif structures at neutral pH. , 2015, Physical chemistry chemical physics : PCCP.