Systematic review of next-generation sequencing simulators: computational tools, features and perspectives

High-throughput next-generation sequencing (NGS) technologies have rapidly generated a large volume of genomic data. To aid the development and evaluation of new statistical models and computational methods, NGS-based simulators have been proposed to construct better experimental workflows. However, the comparative performance of these NGS simulators remains unclear. In this review, we conducted a comprehensive investigation of NGS simulators for various sequencing techniques, including DNA sequencing, metagenomic sequencing, RNA-seq, ChIP-seq and bisulfite sequencing for methylation.

[1]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[2]  Mark Gerstein,et al.  Modeling ChIP Sequencing In Silico with Applications , 2008, PLoS Comput. Biol..

[3]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[4]  David M. A. Martin Gigsaw – physical simulation of next generation sequencing for education and outreach , 2012 .

[5]  Fredrik Lysholm,et al.  An efficient simulator of 454 data using configurable statistical models , 2011, BMC Research Notes.

[6]  Edward M. Rubin,et al.  Metagenomics: DNA sequencing of environmental samples , 2005, Nature Reviews Genetics.

[7]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[8]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .

[9]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[10]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[11]  Qiang Hu,et al.  SCNVSim: somatic copy number variation and structure variation simulator , 2015, BMC Bioinformatics.

[12]  Saurabh Gupta,et al.  SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data , 2013, BMC Bioinformatics.

[13]  Chaochun Wei,et al.  NeSSM: A Next-Generation Sequencing Simulator for Metagenomics , 2013, PloS one.

[14]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[15]  C. Gu,et al.  A Whole‐Genome Simulator Capable of Modeling High‐Order Epistasis for Complex Disease , 2013, Genetic epidemiology.

[16]  Zheng Xu,et al.  AbCD: arbitrary coverage design for sequencing-based genetic studies , 2013, Bioinform..

[17]  Wei Chen,et al.  A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families , 2012, PLoS genetics.

[18]  C. Sheridan Illumina claims $1,000 genome win , 2014, Nature Biotechnology.

[19]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[20]  Christian Burks,et al.  GenFrag 2.1: new features for more robust fragment assembly benchmarks , 1994, Comput. Appl. Biosci..

[21]  Nick Goldman,et al.  Realistic simulations reveal extensive sample-specificity of RNA-seq biases , 2013, 1308.3172.

[22]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[23]  A. Kasarskis,et al.  A window into third-generation sequencing. , 2010, Human molecular genetics.

[24]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[25]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[26]  Petros Dellaportas,et al.  WGBSSuite: simulating whole-genome bisulphite sequencing data and benchmarking differential DNA methylation analysis tools , 2015, Bioinform..

[27]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[28]  Matthew Ruffalo,et al.  Accurate estimation of short read mapping quality for next-generation genome sequencing , 2012, Bioinform..

[29]  C. Burks,et al.  Artificially generated data sets for testing DNA sequence assembly algorithms. , 1993, Genomics.

[30]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[31]  Vineet Bafna,et al.  Wessim: a whole-exome sequencing simulator based on in silico exome capture , 2013, Bioinform..

[32]  S. Caboche,et al.  Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data , 2014, BMC Genomics.

[33]  Chun Li,et al.  GWAsimulator: a rapid whole-genome simulation program , 2007, Bioinform..

[34]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[35]  Jeffrey R. Long,et al.  A better sequence-read simulator program for metagenomics , 2014, BMC Bioinformatics.

[36]  J. Bourke,et al.  Exon skipping and dystrophin restoration in patients with Duchenne muscular dystrophy after systemic phosphorodiamidate morpholino oligomer treatment: an open-label, phase 2, dose-escalation study , 2011, The Lancet.

[37]  Dan Nettleton,et al.  SimSeq: a nonparametric approach to simulation of RNA-sequence datasets , 2015, Bioinform..

[38]  Kiyoshi Asai,et al.  PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[39]  T. Thomas,et al.  GemSIM: general, error-model based simulator of next-generation sequencing data , 2012, BMC Genomics.

[40]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[41]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[42]  Emery H Bresnick,et al.  jMOSAiCS: joint analysis of multiple ChIP-seq datasets , 2013, Genome Biology.

[43]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[44]  Thomas Tuschl,et al.  Comprehensive profiling of circulating microRNA via small RNA sequencing of cDNA libraries reveals biomarker potential and limitations , 2013, Proceedings of the National Academy of Sciences.

[45]  Inge Jonassen,et al.  Characteristics of 454 pyrosequencing data - enabling realistic simulation with flowsim , 2011, Bioinform..