SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data

Accumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS datasets, including false-positives (FP), false-negatives (FN) due to allelic dropout and cell doublets, significantly complicate these tasks. Here, we propose a non-parametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone and the evolutionary relationships between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising of partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental datasets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.

[1]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[2]  N. Beerenwinkel,et al.  Tree inference for single-cell data , 2016, bioRxiv.

[3]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[4]  A. Bouchard-Côté,et al.  PyClone: statistical inference of clonal population structure in cancer , 2014, Nature Methods.

[5]  Andrew Menzies,et al.  Subclonal diversification of primary breast cancer revealed by multiregion sequencing , 2015, Nature Medicine.

[6]  James Hicks,et al.  Unravelling biology and shifting paradigms in cancer with single-cell sequencing , 2017, Nature Reviews Cancer.

[7]  Ken Chen,et al.  Computational approaches for inferring tumor evolution from single-cell genomic data , 2018 .

[8]  Huanming Yang,et al.  Single-Cell Exome Sequencing and Monoclonal Evolution of a JAK2-Negative Myeloproliferative Neoplasm , 2012, Cell.

[9]  A. Børresen-Dale,et al.  The Life History of 21 Breast Cancers , 2012, Cell.

[10]  Martin A. Nowak,et al.  Mutations driving CLL and their evolution in progression and relapse , 2015, Nature.

[11]  Alexander Davis,et al.  Computing tumor trees from single cells , 2016, Genome Biology.

[12]  W. Koh,et al.  Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics , 2014, Proceedings of the National Academy of Sciences.

[13]  Ken Chen,et al.  SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models , 2017, Genome Biology.

[14]  K. Ickstadt,et al.  Improved criteria for clustering based on the posterior similarity matrix , 2009 .

[15]  Single , 2020, Definitions.

[16]  C. Maley,et al.  Cancer is a disease of clonal evolution within the body1–3. This has profound clinical implications for neoplastic progression, cancer prevention and cancer therapy. Although the idea of cancer as an evolutionary problem , 2006 .

[17]  F. Cleton Evolution of Cancer , 1991, British Journal of Cancer.

[18]  P. A. Futreal,et al.  Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing , 2014, Nature Genetics.

[19]  Irmtraud M. Meyer,et al.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers , 2012, Nature.

[20]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[21]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[22]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[23]  N. Navin,et al.  The first five years of single-cell cancer genomics and beyond , 2015, Genome research.

[24]  S. Scherer,et al.  Clonal Selection Drives Genetic Divergence of Metastatic Medulloblastoma , 2012, Nature.

[25]  Santa Fe Institute,et al.  Cancer research meets evolutionary biology , 2009, Evolutionary applications.

[26]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[27]  N. McGranahan,et al.  The causes and consequences of genetic heterogeneity in cancer evolution , 2013, Nature.

[28]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[29]  Benjamin J. Raphael,et al.  Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. , 2016, Cell systems.

[30]  Raazesh Sainudiin,et al.  A Beta-splitting model for evolutionary trees , 2015, Royal Society Open Science.

[31]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[32]  J. Salk Clonal evolution in cancer , 2010 .

[33]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[34]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[35]  Nancy R. Zhang,et al.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing , 2016, Proceedings of the National Academy of Sciences.

[36]  Peter J. Campbell,et al.  Evolution of the cancer genome , 2012, Nature Reviews Genetics.

[37]  Xuemei Lu,et al.  Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution , 2015, Proceedings of the National Academy of Sciences.

[38]  Benjamin J. Raphael,et al.  Inferring Parsimonious Migration Histories for Metastatic Cancers , 2018, Nature Genetics.

[39]  Rajdeep Chowdhury,et al.  Cancer: An Evolutionary , 2015 .

[40]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[41]  Florian Markowetz,et al.  OncoNEM: inferring tumor evolution from single-cell sequencing data , 2016, Genome Biology.

[42]  Richard S. Zemel,et al.  Learning stick-figure models using nonparametric Bayesian priors over trees , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Marc J. Williams,et al.  Identification of neutral tumor evolution across cancer types , 2016, Nature Genetics.

[44]  Yong Wang,et al.  Single-cell DNA sequencing reveals a late-dissemination model in metastatic colorectal cancer , 2017, Genome research.

[45]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[46]  R. Gillies,et al.  Evolutionary dynamics of carcinogenesis and why targeted therapy does not work , 2012, Nature Reviews Cancer.

[47]  Alexandre Bouchard-Côté,et al.  Clonal genotype and population structure inference from single-cell tumor sequencing , 2016, Nature Methods.

[48]  N. Navin,et al.  Highly multiplexed targeted DNA sequencing from single nuclei , 2016, Nature Protocols.

[49]  N. Navin Cancer genomics: one cell at a time , 2014, Genome Biology.

[50]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[51]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[52]  Huanming Yang,et al.  Single-cell sequencing analysis characterizes common and cell-lineage-specific mutations in a muscle-invasive bladder cancer , 2012, GigaScience.

[53]  Jack Kuipers,et al.  Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors , 2017, Genome research.

[54]  Luca Toschi,et al.  Preexistence and clonal selection of MET amplification in EGFR mutant NSCLC. , 2010, Cancer cell.

[55]  Shankar Vembu,et al.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors , 2015, Genome Biology.

[56]  Faraz Hach,et al.  PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data , 2019, Genome Research.