Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate

DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule-Real-Time technology. We show that polymerization speed differs between non-B and B-DNA: it decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates) and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.

[1]  Marzia A. Cremona,et al.  IWTomics: testing high‐resolution sequence‐based ‘Omics' data at multiple locations and scales , 2018, Bioinform..

[2]  Alessia Pini,et al.  Interval-wise testing for functional data , 2017 .

[3]  T. Przytycka,et al.  Permanganate/S1 Nuclease Footprinting Reveals Non-B DNA Structures with Regulatory Potential across a Mammalian Genome. , 2017, Cell systems.

[4]  R. Fiala,et al.  Clustered abasic lesions profoundly change the structure and stability of human telomeric G-quadruplexes , 2017, Nucleic acids research.

[5]  M. Yakubovskaya,et al.  Structure, properties, and biological relevance of the DNA and RNA G-quadruplexes: Overview 50 years after their discovery , 2016, Biochemistry (Moscow).

[6]  Anne-Laure Valton,et al.  G-Quadruplexes in DNA Replication: A Problem or a Necessity? , 2016, Trends in genetics : TIG.

[7]  Hiroshi Kimura,et al.  G-quadruplex structures mark human regulatory chromatin , 2016, Nature Genetics.

[8]  Marzia A. Cremona,et al.  Integration and Fixation Preferences of Human and Mouse Endogenous Retroviruses Uncovered with Functional Data Analysis , 2016, PLoS Comput. Biol..

[9]  D. Cooper,et al.  A Role for Non‐B DNA Forming Sequences in Mediating Microlesions Causing Human Inherited Disease , 2016, Human mutation.

[10]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[11]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[12]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[13]  N. Maizels G4‐associated human diseases , 2015, EMBO reports.

[14]  G. Smith,et al.  High-throughput sequencing of DNA G-quadruplex structures in the human genome , 2015, Nature Biotechnology.

[15]  G. Parkinson,et al.  G‐quadruplexes: Emerging roles in neurodegenerative diseases and the non‐coding transcriptome , 2015, FEBS letters.

[16]  Z. Pursell,et al.  Evidence for the kinetic partitioning of polymerase activity on G-quadruplex DNA. , 2015, Biochemistry.

[17]  Kateryna D. Makova,et al.  The effects of chromatin organization on variation in mutation rates in the genome , 2015, Nature Reviews Genetics.

[18]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[19]  Paul Medvedev,et al.  Accurate typing of short tandem repeats from genome-wide sequencing data and its applications , 2015, Genome research.

[20]  V. Pawlowsky-Glahn,et al.  Modelling and Analysis of Compositional Data: Pawlowsky-Glahn/Modelling and Analysis of Compositional Data , 2015 .

[21]  Michael A. Black,et al.  Exploring possible DNA structures in real-time polymerase kinetics using Pacific Biosciences sequencer data , 2015, BMC Bioinformatics.

[22]  A. Schäffer,et al.  Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation , 2014, Nucleic acids research.

[23]  K. Makova,et al.  Microsatellite Interruptions Stabilize Primate Genomes and Exist as Population-Specific Single Nucleotide Polymorphisms within Individual Human Genomes , 2014, PLoS genetics.

[24]  B. Emanuel,et al.  Two sequential cleavage reactions on cruciform DNA structures cause palindrome-mediated chromosomal translocations , 2013, Nature Communications.

[25]  Michael A. Black,et al.  Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements , 2013, PloS one.

[26]  Tyson A. Clark,et al.  Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases , 2013, Genome research.

[27]  Sarah McCalmon,et al.  Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene , 2013, Genome research.

[28]  Ming Yi,et al.  Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools , 2012, Nucleic Acids Res..

[29]  Jean-Michel Marin,et al.  Unraveling cell type–specific and reprogrammable human replication origin signatures associated with G-quadruplex consensus motifs , 2012, Nature Structural &Molecular Biology.

[30]  A. Jansen,et al.  Distal chromatin structure influences local nucleosome positions and gene expression , 2012, Nucleic acids research.

[31]  Alan Hodgkinson,et al.  Variation in the mutation rate across mammalian genomes , 2011, Nature Reviews Genetics.

[32]  David Heckerman,et al.  A Hexanucleotide Repeat Expansion in C9ORF72 Is the Cause of Chromosome 9p21-Linked ALS-FTD , 2011, Neuron.

[33]  David N Cooper,et al.  On the sequence‐directed nature of human gene mutation: The role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease , 2011, Human mutation.

[34]  Stephen Neidle,et al.  Targeting G-quadruplexes in gene promoters: a novel anticancer strategy? , 2011, Nature Reviews Drug Discovery.

[35]  D. Cooper,et al.  Non-B DNA-forming Sequences and WRN Deficiency Independently Increase the Frequency of Base Substitution in Human Cells* , 2011, The Journal of Biological Chemistry.

[36]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[37]  P. Hanawalt,et al.  Mechanisms and implications of transcription blockage by guanine-rich DNA sequences , 2010, Proceedings of the National Academy of Sciences.

[38]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[39]  A. Lane,et al.  Resolution and characterization of the structural polymorphism of a single quadruplex-forming sequence , 2010, Nucleic acids research.

[40]  C. E. Pearson,et al.  Repeat instability as the basis for human diseases and as a potential target for therapy , 2010, Nature Reviews Molecular Cell Biology.

[41]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[42]  Jaroslav Kypr,et al.  Circular dichroism and conformational polymorphism of DNA , 2009, Nucleic acids research.

[43]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[44]  A. Bacolla,et al.  Non-B DNA structure-induced genetic instability and evolution , 2009, Cellular and Molecular Life Sciences.

[45]  Taylor Sandra,et al.  Hypothesis tests for point-mass mixture data with application to 'omics data with many zero values. , 2009 .

[46]  Sandra Taylor,et al.  Hypothesis tests for point-mass mixture data with application to 'omics data with many zero values. , 2009, Statistical applications in genetics and molecular biology.

[47]  J. Vijg,et al.  DNA structure-induced genomic instability in vivo. , 2008, Journal of the National Cancer Institute.

[48]  S. Mirkin,et al.  Replication stalling at unstable inverted repeats: Interplay between DNA hairpins and fork stabilizing proteins , 2008, Proceedings of the National Academy of Sciences.

[49]  C. McMurray,et al.  Single-stranded DNA-binding Protein in Vitro Eliminates the Orientation-dependent Impediment to Polymerase Passage on CAG/CTG Repeats* , 2008, Journal of Biological Chemistry.

[50]  E. Eisenberg,et al.  Trinucleotide repeats are prevalent among cancer-related genes. , 2008, Trends in genetics : TIG.

[51]  S. Mirkin Discovery of alternative DNA structures: a heroic decade (1979-1989). , 2008, Frontiers in bioscience : a journal and virtual library.

[52]  R. Sinden,et al.  Slipped strand DNA structures. , 2007, Frontiers in bioscience : a journal and virtual library.

[53]  S. Mirkin Expandable DNA repeats and human disease , 2007, Nature.

[54]  Guliang Wang,et al.  Z-DNA, an active element in the genome. , 2007, Frontiers in bioscience : a journal and virtual library.

[55]  S. Mirkin,et al.  Replication Fork Stalling at Natural Impediments , 2007, Microbiology and Molecular Biology Reviews.

[56]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[57]  Stephen Neidle,et al.  Quadruplex nucleic acids. , 2006 .

[58]  K. Eckert,et al.  DNA Polymerases and Human Diseases , 2006, Radiation research.

[59]  Shankar Balasubramanian,et al.  Prevalence of quadruplexes in the human genome , 2005, Nucleic acids research.

[60]  R. Wells,et al.  Non-B DNA Conformations, Genomic Rearrangements, and Human Disease* , 2004, Journal of Biological Chemistry.

[61]  David N Cooper,et al.  Breakpoints of gross deletions coincide with non-B DNA conformations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[62]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[63]  S. Mirkin,et al.  Replication Stalling at Friedreich's Ataxia (GAA)n Repeats In Vivo , 2004, Molecular and Cellular Biology.

[64]  K. Eckert,et al.  Positive correlation between DNA polymerase alpha-primase pausing and mutagenesis within polypyrimidine/polypurine microsatellite sequences. , 2004, Journal of molecular biology.

[65]  D. Bearss,et al.  Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Stephen Neidle,et al.  Crystal structure of parallel quadruplexes from human telomeric DNA , 2002, Nature.

[67]  Kateryna D. Makova,et al.  Strong male-driven evolution of DNA sequences in humans and apes , 2002, Nature.

[68]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[69]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[70]  S. Mirkin,et al.  Trinucleotide repeats affect DNA replication in vivo , 1997, Nature Genetics.

[71]  M. Fry,et al.  The Fragile X Syndrome Single Strand d(CGG)n Nucleotide Repeats Readily Fold Back to Form Unimolecular Hairpin Structures * , 1995, The Journal of Biological Chemistry.

[72]  R. Wells,et al.  Pausing of DNA Synthesis in Vitro at Specific Loci in CTG and CGG Triplet Repeats from Human Hereditary Disease Genes (*) , 1995, The Journal of Biological Chemistry.

[73]  K. Woodford,et al.  CGG repeats associated with DNA instability and chromosome fragility form structures that block DNA synthesis in vitro. , 1995, Nucleic acids research.

[74]  L. Loeb,et al.  A DNA polymerase alpha pause site is a hot spot for nucleotide misinsertion. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[75]  A. Rich,et al.  Transcription is associated with Z-DNA formation in metabolically active permeabilized mammalian cell nuclei. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[76]  W. Gilbert,et al.  Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis , 1988, Nature.

[77]  S. Mirkin,et al.  DNA H form requires a homopurine–homopyrimidine mirror repeat , 1987, Nature.

[78]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[79]  F. Crick,et al.  Genetical Implications of the Structure of Deoxyribonucleic Acid , 1953, Nature.