Streamlined analysis of duplex sequencing data with Du Novo

Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex.

[1]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[2]  Anton Nekrutenko,et al.  Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study , 2011, Genome Biology.

[3]  Anton Nekrutenko,et al.  Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA , 2014, Proceedings of the National Academy of Sciences.

[4]  Jiang Li,et al.  The effect of strand bias in Illumina short-read sequencing data , 2012, BMC Genomics.

[5]  David M. Kramer,et al.  Biochemistry and Molecular Biology , 1968, Nature.

[6]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[7]  Christopher M. Hindson,et al.  Absolute quantification by droplet digital PCR versus analog real-time PCR , 2013, Nature Methods.

[8]  C. Quince,et al.  Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform , 2015, Nucleic acids research.

[9]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[10]  D. Wallace,et al.  Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. , 2013, Cold Spring Harbor perspectives in biology.

[11]  Brendan F. Kohrn,et al.  Detecting ultralow-frequency mutations by Duplex Sequencing , 2014, Nature Protocols.

[12]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[13]  M. Stoneking,et al.  Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations , 2015, Proceedings of the National Academy of Sciences.

[14]  Alan Hodgkinson,et al.  High-Resolution Genomic Analysis of Human Mitochondrial RNA Sequence Variation , 2014, Science.

[15]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[16]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  Pan Zhang,et al.  Mitochondria sequence mapping strategies and practicability of mitochondria variant detection from exome and RNA sequencing data , 2016, Briefings Bioinform..

[19]  Hanlee P. Ji,et al.  Correction to High Sensitivity Detection and Quantitation of DNA Copy Number and Single Nucleotide Variants with Single Color Droplet Digital PCR , 2015, Analytical chemistry.

[20]  Anton Nekrutenko,et al.  Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. , 2014, BioTechniques.

[21]  Yingrui Li,et al.  Estimation of allele frequency and association mapping using next-generation sequencing data , 2011, BMC Bioinformatics.

[22]  D. Kwiatkowski,et al.  Optimizing illumina next-generation sequencing library preparation for extremely at-biased genomes , 2012, BMC Genomics.

[23]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[24]  Lawrence D True,et al.  Sequencing small genomic targets with high efficiency and extreme accuracy , 2015, Nature Methods.

[25]  K. Katoh,et al.  Improvements in Performance and Usability , 2013 .

[26]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[27]  Cassandra B. Jabara,et al.  Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID , 2011, Proceedings of the National Academy of Sciences.

[28]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[29]  R. Dimond,et al.  Social and ethical issues in mitochondrial donation. , 2015, British medical bulletin.

[30]  Mark Stoneking,et al.  A new approach for detecting low-level mutations in next-generation sequence data , 2012, Genome Biology.

[31]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.