Meltos: multi-sample tumor phylogeny reconstruction for structural variants

MOTIVATION We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions of SV events. RESULTS In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with 5 genomes in both. We also assessed Meltos on two real cancer data sets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY Meltos is available at https://github.com/ih-lab/Meltos.

[1]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[2]  Martin A Nowak,et al.  Timing and heterogeneity of mutations associated with drug resistance in metastatic cancers , 2014, Proceedings of the National Academy of Sciences.

[3]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[4]  Yi Kan Wang,et al.  Integrated single-nucleotide and structural variation signatures of DNA-repair deficient human cancers , 2018, bioRxiv.

[5]  Benjamin J. Raphael,et al.  Tumor phylogeny inference using tree-constrained importance sampling , 2017, Bioinform..

[6]  Shankar Vembu,et al.  Inferring clonal evolution of tumors from single nucleotide somatic mutations , 2012, BMC Bioinformatics.

[7]  Shinichi Morishita,et al.  Integrative analysis of genomic alterations in triple-negative breast cancer in association with homologous recombination deficiency , 2017, PLoS genetics.

[8]  E. Eichler,et al.  Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. , 2009, Genome research.

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[11]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[12]  In-Hee Lee,et al.  Clonal evolution of glioblastoma under therapy , 2016, Nature Genetics.

[13]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[14]  Junfeng Wang,et al.  Inferring Clonal Composition from Multiple Sections of a Breast Cancer , 2014, PLoS Comput. Biol..

[15]  Jing Liu,et al.  Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma , 2016, American journal of human genetics.

[16]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[17]  Benjamin J. Raphael,et al.  Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. , 2016, Cell systems.

[18]  Charles Gawad,et al.  Genome-wide segregation of single nucleotide and structural variants into single cancer cells , 2017, BMC Genomics.

[19]  Nilgun Donmez,et al.  Clonality Inference from Single Tumor Samples Using Low-Coverage Sequence Data , 2017, J. Comput. Biol..

[20]  Li Ding,et al.  Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers , 2012, Cell.

[21]  Benjamin J. Raphael,et al.  Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data , 2014, Bioinform..

[22]  Sohrab P. Shah,et al.  deStruct: Accurate Rearrangement Detection using Breakpoint Specific Realignment , 2017, bioRxiv.

[23]  E. Eichler,et al.  Simultaneous structural variation discovery among multiple paired-end sequenced genomes. , 2011, Genome research.

[24]  Andrew Menzies,et al.  Subclonal diversification of primary breast cancer revealed by multiregion sequencing , 2015, Nature Medicine.

[25]  Christopher W. Whelan,et al.  Structural Alterations Driving Castration-Resistant Prostate Cancer Revealed by Linked-Read Genome Sequencing , 2018, Cell.

[26]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[27]  Johannes G. Reiter,et al.  The molecular evolution of acquired resistance to targeted EGFR blockade in colorectal cancers , 2012, Nature.

[28]  Iman Hajirasouliha,et al.  Fast and scalable inference of multi-sample cancer lineages , 2014, Genome Biology.

[29]  Benjamin J. Raphael,et al.  Reconstruction of clonal trees and tumor composition from multi-sample sequencing data , 2015, Bioinform..

[30]  Mohammed El-Kebir,et al.  SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error , 2018, Bioinform..

[31]  Ken Chen,et al.  Towards accurate characterization of clonal heterogeneity based on structural variation , 2014, BMC Bioinformatics.

[32]  Russell Schwartz,et al.  Deconvolution and phylogeny inference of structural variations in tumor genomic samples , 2018, bioRxiv.

[33]  A. Bouchard-Côté,et al.  PyClone: statistical inference of clonal population structure in cancer , 2014, Nature Methods.

[34]  Jan Schröder,et al.  SVclone: inferring structural variant cancer cell fraction , 2017, bioRxiv.

[35]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[36]  L. Ding,et al.  novoBreak: local assembly for breakpoint detection in cancer genomes , 2016, Nature Methods.

[37]  Jian Ma,et al.  Allele-Specific Quantification of Structural Variations in Cancer Genomes , 2016, bioRxiv.

[38]  Steven J. M. Jones,et al.  The genetic landscape of high-risk neuroblastoma , 2013, Nature Genetics.

[39]  Niko Beerenwinkel,et al.  BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies , 2015, Genome Biology.

[40]  Nancy R. Zhang,et al.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing , 2016, Proceedings of the National Academy of Sciences.

[41]  Can Alkan,et al.  Toolkit for automated and rapid discovery of structural variants. , 2017, Methods.

[42]  Nilgun Donmez,et al.  Clonality inference in multiple tumor samples using phylogeny , 2015, Bioinform..

[43]  Peter J. Campbell,et al.  Evolution of the cancer genome , 2012, Nature Reviews Genetics.

[44]  Iman Hajirasouliha,et al.  A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data , 2014, Bioinform..

[45]  Florian Markowetz,et al.  A phylogenetic latent feature model for clonal deconvolution , 2016, 1604.01715.

[46]  M. Elowitz,et al.  Challenges and emerging directions in single-cell analysis , 2017, Genome Biology.

[47]  Shankar Vembu,et al.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors , 2015, Genome Biology.

[48]  Sohrab P. Shah,et al.  TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data , 2014, Genome research.

[49]  Serafim Batzoglou,et al.  Genome-wide reconstruction of complex structural variants using read clouds , 2016, Nature Methods.

[50]  Gary D. Bader,et al.  Divergent clonal selection dominates medulloblastoma at recurrence , 2016, Nature.