A virtual sequencer reveals the dephasing patterns in error-correction code DNA sequencing

Abstract An error-correction code (ECC) sequencing approach has recently been reported to effectively reduce sequencing errors by interrogating a DNA fragment with three orthogonal degenerate sequencing-by-synthesis (SBS) reactions. However, similar to other non-single-molecule SBS methods, the reaction will gradually lose its synchronization within a molecular colony in ECC sequencing. This phenomenon, called dephasing, causes sequencing error, and in ECC sequencing, induces distinctive dephasing patterns. To understand the characteristic dephasing patterns of the dual-base flowgram in ECC sequencing and to generate a correction algorithm, we built a virtual sequencer in silico. Starting from first principles and based on sequencing chemical reactions, we simulated ECC sequencing results, identified the key factors of dephasing in ECC sequencing chemistry and designed an effective dephasing algorithm. The results show that our dephasing algorithm is applicable to sequencing signals with at least 500 cycles, or 1000-bp average read length, with acceptably low error rate for further parity checks and ECC deduction. Our virtual sequencer with our dephasing algorithm can further be extended to a dichromatic form of ECC sequencing, allowing for a potentially much more accurate sequencing approach.

[1]  X. Xie,et al.  Highly accurate fluorogenic DNA sequencing with information theory–based error correction , 2017, Nature Biotechnology.

[2]  J. Shendure,et al.  DNA sequencing at 40: past, present and future , 2017, Nature.

[3]  Aaron M. Streets,et al.  Single-Cell Transcriptional Analysis. , 2017, Annual review of analytical chemistry.

[4]  R. Myers,et al.  Advancements in Next-Generation Sequencing. , 2016, Annual review of genomics and human genetics.

[5]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[6]  Johan Paulsson,et al.  Stochastic Switching of Cell Fate in Microbes. , 2015, Annual review of microbiology.

[7]  X. Xie,et al.  Fluorogenic Sequencing Using Halogen‐Fluorescein‐Labeled Nucleotides , 2015, Chembiochem : a European journal of chemical biology.

[8]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[9]  Sara Goodwin,et al.  Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome , 2015, bioRxiv.

[10]  Hong Qian,et al.  Stochastic phenotype transition of a single cell in an intermediate region of gene state switching. , 2013, Physical review letters.

[11]  Qing Nie,et al.  Noise drives sharpening of gene expression boundaries in the zebrafish hindbrain , 2012, Molecular systems biology.

[12]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[13]  C. Ku,et al.  Studying the epigenome using next generation sequencing , 2011, Journal of Medical Genetics.

[14]  Bernard P. Puc,et al.  An integrated semiconductor device enabling non-optical genome sequencing , 2011, Nature.

[15]  William J. Greenleaf,et al.  Fluorogenic DNA Sequencing in PDMS Microreactors , 2011, Nature Methods.

[16]  A. Oudenaarden,et al.  Cellular Decision Making and Biological Noise: From Microbes to Mammals , 2011, Cell.

[17]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[18]  V. de Lorenzo,et al.  Noise and robustness in prokaryotic regulatory networks. , 2010, Annual review of microbiology.

[19]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[20]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[21]  Dmitry Pushkarev,et al.  Single-molecule sequencing of an individual human genome , 2009, Nature Biotechnology.

[22]  H. Bayley,et al.  Continuous base identification for single-molecule nanopore DNA sequencing. , 2009, Nature nanotechnology.

[23]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[24]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[25]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[26]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[27]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[28]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[29]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[30]  C. Fuller,et al.  Terminal phosphate-labeled nucleotides with improved substrate properties for homogeneous nucleic acid assays. , 2005, Journal of the American Chemical Society.

[31]  Smita S. Patel,et al.  Pre-steady-state kinetic analysis of processive DNA replication including complete characterization of an exonuclease-deficient mutant. , 1991, Biochemistry.

[32]  E. Holler,et al.  Interaction of DNA polymerase I of Escherichia coli with nucleotides. Antagonistic effects of single-stranded polynucleotide homopolymers. , 1985, Biochemistry.

[33]  T. Jovin,et al.  Enzymatic synthesis of deoxyribonucleic acid. XXX. Binding of triphosphates to deoxyribonucleic acid polymerase. , 1969, The Journal of biological chemistry.

[34]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.