MVSC: a multi-variation simulator of cancer genome.

BACKGROUND Many forms of variation exist in the genome, which are the main causes of individual phenotypic differences. The detection of variants, especially those located in the tumor genome, still faces many challenges due to the complexity of genome structure. Thus, the performance assessment of variation detection tools using next-generation sequencing platforms is urgently needed. METHODS AND RESULTS We have created a software package called the Multi-Variation Simulator of Cancer genomes (MVSC) to simulate common genomic variants including single nucleotide polymorphisms, small insertion and deletion polymorphisms, and structural variations (SVs), which are analogous to human somatically acquired variation. Three sets of variations embedded in genomic sequences in different periods are dynamically and sequentially simulated one by one. In cancer genome simulation, complex SVs are important because this type of variation is characteristic of the tumor genome structure. Overlapping variations of different sizes can also coexist in the same genome regions, adding to the complexity of cancer genome architecture. Our results show that MVSC can efficiently simulate a variety of genomic variants that cannot be simulated by existing software packages. CONCLUSION The MVSC-simulated variants can be used to assess the performance of existing tools designed to detect SVs in next-generation sequencing data, and we also find that MVSC is memory and time efficient compared with similar software packages.

[1]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[2]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2013, Cell.

[3]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[4]  Saurabh Gupta,et al.  SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data , 2013, BMC Bioinformatics.

[5]  Qiang Hu,et al.  SCNVSim: somatic copy number variation and structure variation simulator , 2015, BMC Bioinformatics.

[6]  E. Eichler,et al.  A Human Genome Structural Variation Sequencing Resource Reveals Insights into Mutational Mechanisms , 2010, Cell.

[7]  Yoon-La Choi,et al.  Mechanisms and Consequences of Cancer Genome Instability: Lessons from Genome Sequencing Studies. , 2016, Annual review of pathology.

[8]  Song Liu,et al.  FUSIM: a software tool for simulating fusion transcripts , 2013, BMC Bioinformatics.

[9]  Martin Dugas,et al.  RSVSim: an R/Bioconductor package for the simulation of structural variations , 2013, Bioinform..

[10]  Yadong Wang,et al.  PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants , 2012, Bioinform..

[11]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[12]  H. Sakasegawa,et al.  Stratified rejection and squeeze method for generating beta random numbers , 1983 .

[13]  Thomas J Hudson,et al.  Resources for genetic variation studies. , 2006, Annual review of genomics and human genetics.

[14]  N. Carter,et al.  Estimation of rearrangement phylogeny for cancer genomes. , 2012, Genome research.

[15]  Shiheng Tao,et al.  Patterns of Insertion and Deletion in Mammalian Genomes , 2007, Current genomics.

[16]  Ryan E. Mills,et al.  Resolving complex structural genomic rearrangements using a randomized approach , 2016, Genome Biology.

[17]  Narayanaswamy Balakrishnan,et al.  Evaluation of Beta Generation Algorithms , 2009, Commun. Stat. Simul. Comput..

[18]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[19]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[20]  Ira M. Hall,et al.  Characterizing complex structural variation in germline and somatic genomes. , 2012, Trends in genetics : TIG.

[21]  Min Zhao,et al.  Systematic review of next-generation sequencing simulators: computational tools, features and perspectives , 2016, Briefings in functional genomics.

[22]  Guohua Huang,et al.  A Novel Neighborhood Model to Predict Protein Function from Protein- Protein Interaction Data , 2014 .

[23]  S. C. Sahinalp,et al.  nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing , 2012, Genome research.

[24]  Jian-Qun Chen,et al.  Important role of indels in somatic mutations of human cancer genes , 2010, BMC Medical Genetics.

[25]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[26]  Jun Sese,et al.  COSMOS: accurate detection of somatic structural variations through asymmetric comparison between tumor and normal samples , 2016, Nucleic acids research.

[27]  A. Magi,et al.  Detection of Genomic Structural Variants from Next-Generation Sequencing Data , 2015, Front. Bioeng. Biotechnol..

[28]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[29]  R. Cartwright Problems and solutions for estimating indel rates and length distributions. , 2009, Molecular biology and evolution.

[30]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[31]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[32]  Fang Liu,et al.  ABCC4 copy number variation is associated with susceptibility to esophageal squamous cell carcinoma. , 2014, Carcinogenesis.

[33]  Junhyong Kim,et al.  Sniper: improved SNP discovery by multiply mapping deep sequenced reads , 2011, Genome Biology.

[34]  Huanming Yang,et al.  Genomic Analyses Reveal Mutational Signatures and Frequently Altered Genes in Esophageal Squamous Cell Carcinoma , 2015, American journal of human genetics.

[35]  Mark Gerstein,et al.  Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. , 2003, Nucleic acids research.

[36]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[37]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[38]  Shaojie Liu,et al.  A Massively Parallel Computational Method of Reading Index Files for SOAPsnv , 2015, Interdisciplinary Sciences: Computational Life Sciences.

[39]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[40]  J. Lupski,et al.  Mechanisms for human genomic rearrangements , 2008, PathoGenetics.

[41]  C. Walsh,et al.  Somatic Mutation, Genomic Variation, and Neurological Disease , 2013, Science.

[42]  Bo Liao,et al.  An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees , 2017, BioMed research international.

[43]  Mark Gerstein,et al.  VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications , 2014, Bioinform..

[44]  Ying Liang,et al.  Seeksv: an accurate tool for somatic structural variation and virus integration detection , 2017, Bioinform..