MOTIVATION
Third generation sequencing methods provide longer reads than second generation methods and have distinct error characteristics. While there exist many read simulators for second generation data, there is a very limited choice for third generation data.
RESULTS
We analyzed public data from Pacific Biosciences (PacBio) SMRT sequencing, developed an error model and implemented it in a new read simulator called SimLoRD. It offers options to choose the read length distribution and to model error probabilities depending on the number of passes through the sequencer. The new error model makes SimLoRD the most realistic SMRT read simulator available.
AVAILABILITY AND IMPLEMENTATION
SimLoRD is available open source at http://bitbucket.org/genomeinformatics/simlord/ and installable via Bioconda (http://bioconda.github.io).
CONTACT
Bianca.Stoecker@uni-due.de or Sven.Rahmann@uni-due.de
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
[1]
Anna Shcherbina,et al.
FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets
,
2014,
BMC Research Notes.
[2]
Glenn Tesler,et al.
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
,
2012,
BMC Bioinformatics.
[3]
Leping Li,et al.
ART: a next-generation sequencing read simulator
,
2012,
Bioinform..
[4]
Kiyoshi Asai,et al.
PBSIM: PacBio reads simulator - toward accurate genome assembly
,
2013,
Bioinform..
[5]
S. Turner,et al.
Real-Time DNA Sequencing from Single Polymerase Molecules
,
2009,
Science.