论文信息 - Badread: simulation of error-prone long reads

Badread: simulation of error-prone long reads

DNA sequencing platforms aim to measure the sequence of nucleotides (A, C, G and T) in a sample of DNA. Sequencers made by Illumina have been the dominant technology for much of the past decade, but their platforms generate fragments of sequence (‘reads’) that are relatively small (~100–300 nucleotides in length). In contrast, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) produce ‘long-read’ sequencers that can generate sequence fragments with tens of thousands of nucleotides or more (Eisenstein, 2017). Long reads from these platforms can be very beneficial for genome assembly and other bioinformatic analyses (Koren, Walenz, Berlin, Miller, & Phillippy, 2017; Phillippy, 2017). ONT and PacBio sequencers achieve their long read lengths because they detect nucleotides in individual molecules of DNA, a.k.a. single-molecule sequencing (Heather & Chain, 2016). However, the stochastic nature of measuring at the single-molecule scale means that ONT and PacBio reads are ‘noisy’ – they contain a significant amount of errors.

Ryan R. Wick | R. Wick

[1] Marghoob Mohiyuddin,et al. LongISLND: in silico sequencing of lengthy and noisy datatypes , 2016, Bioinform..

[2] Michael Eisenstein,et al. An ace in the hole for DNA sequencing , 2017, Nature.

[3] Adam M Phillippy,et al. New advances in sequence assembly , 2017, Genome research.

[4] Leping Li,et al. ART: a next-generation sequencing read simulator , 2012, Bioinform..

[5] Justin Chu,et al. NanoSim: nanopore sequence read simulator based on statistical characterization , 2016 .

[6] B. Chain,et al. The sequence of sequencers: The history of sequencing DNA , 2016, Genomics.

[7] Kiyoshi Asai,et al. PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[8] S. Koren,et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.