Simulation of heterogeneous tumour genomes with HeteroGenesis and in silico whole exome sequencing

Abstract Summary Tumour evolution results in progressive cancer phenotypes such as metastatic spread and treatment resistance. To better treat cancers, we must characterize tumour evolution and the genetic events that confer progressive phenotypes. This is facilitated by high coverage genome or exome sequencing. However, the best approach by which, or indeed whether, these data can be used to accurately model and interpret underlying evolutionary dynamics is yet to be confirmed. Establishing this requires sequencing data from appropriately heterogeneous tumours in which the exact trajectory and combination of events occurring throughout its evolution are known. We therefore developed HeteroGenesis: a tool to generate realistically evolved tumour genomes, which can be sequenced using weighted-Wessim (w-Wessim), an in silico exome sequencing tool that we have adapted from previous methods. HeteroGenesis simulates more complex and realistic heterogeneous tumour genomes than existing methods, can model different evolutionary dynamics, and enables the creation of multi-region and longitudinal data. Availability and implementation HeteroGenesis and w-Wessim are freely available under the GNU General Public Licence from https://github.com/GeorgetteTanner, implemented in Python and supported on linux and MS Windows. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Z. Jia,et al.  Tumor-Specific Chromosome Mis-Segregation Controls Cancer Plasticity by Maintaining Tumor Heterogeneity , 2013, PloS one.

[2]  Nathan M. Wilson,et al.  Creating Standards for Evaluating Tumour Subclonal Reconstruction , 2018, bioRxiv.

[3]  Ali Bashashati,et al.  Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer , 2016, Nature Genetics.

[4]  Marc J. Williams,et al.  Quantification of subclonal selection in cancer from bulk sequencing data , 2018, Nature Genetics.

[5]  T. Thomas,et al.  GemSIM: general, error-model based simulator of next-generation sequencing data , 2012, BMC Genomics.

[6]  J. Walling,et al.  Detailed longitudinal sampling of glioma stem cells in situ reveals Chr7 gain and Chr10 loss as repeated events in primary tumor formation and recurrence , 2017, International journal of cancer.

[7]  V. P. Collins,et al.  Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics , 2013, Proceedings of the National Academy of Sciences.

[8]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[9]  P. A. Futreal,et al.  Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing , 2014, Nature Genetics.

[10]  Marc J. Williams,et al.  Identification of neutral tumor evolution across cancer types , 2016, Nature Genetics.

[11]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[12]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[13]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[14]  R. Schwarz,et al.  Phylogenetic Quantification of Intratumor Heterogeneity. , 2018, Cold Spring Harbor perspectives in medicine.

[15]  Mark Gerstein,et al.  VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications , 2014, Bioinform..

[16]  Vineet Bafna,et al.  Wessim: a whole-exome sequencing simulator based on in silico exome capture , 2013, Bioinform..

[17]  Anthony J. Cox,et al.  tHapMix: simulating tumour samples through haplotype mixtures , 2016, bioRxiv.

[18]  C. Swanton,et al.  Tumor Evolution as a Therapeutic Target. , 2017, Cancer discovery.

[19]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[20]  Yun Liu,et al.  Pysim-sv: a package for simulating structural variation data with GC-biases , 2017, BMC Bioinformatics.

[21]  Roberto Semeraro,et al.  Xome-Blender: A novel cancer genome simulator , 2018, PloS one.

[22]  Beatriz Carvalho,et al.  Focal chromosomal copy number aberrations in cancer-Needles in a genome haystack. , 2014, Biochimica et biophysica acta.

[23]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[24]  Sally Harrison,et al.  How to analyse the spatiotemporal tumour samples needed to investigate cancer evolution: A case study using paired primary and recurrent glioblastoma , 2018, International journal of cancer.

[25]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[26]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[27]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[28]  Qiang Hu,et al.  SCNVSim: somatic copy number variation and structure variation simulator , 2015, BMC Bioinformatics.

[29]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[30]  Andrea Sottoriva,et al.  Between-Region Genetic Divergence Reflects the Mode and Tempo of Tumor Evolution , 2017, Nature Genetics.

[31]  Dongmei Ai,et al.  SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution , 2018, bioRxiv.