Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data

A new generation of scalable single cell whole genome sequencing (scWGS) methods [Zahn et al., 2017, Laks et al., 2019], allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment [Laks et al., 2019] is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls for tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by O(|C| +|L|), where |C| is the number of cells and |L| is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of O(|C| |L|). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data.

[1]  F. Markowetz,et al.  Cancer Evolution: Mathematical Models and Computational Inference , 2014, Systematic biology.

[2]  Ian T. Fiddes,et al.  Resolving sub-clonal heterogeneity within cell-line growths by single cell sequencing genomic DNA , 2019, bioRxiv.

[3]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[4]  Tao Wang,et al.  Accurate identification of single nucleotide variants in whole genome amplified single cells , 2017, Nature Methods.

[5]  Richard Simon,et al.  Using single cell sequencing data to model the evolutionary history of a tumor , 2014, BMC Bioinformatics.

[6]  David Bruce Wilson,et al.  Generating random spanning trees more quickly than the cover time , 1996, STOC '96.

[7]  Huanming Yang,et al.  Single-Cell Exome Sequencing Reveals Single-Nucleotide Mutation Characteristics of a Kidney Tumor , 2012, Cell.

[8]  James D. Brenton,et al.  Phylogenetic Quantification of Intra-tumour Heterogeneity , 2013, PLoS Comput. Biol..

[9]  P. Johnston,et al.  Cancer drug resistance: an evolving paradigm , 2013, Nature Reviews Cancer.

[10]  Bengt Sennblad,et al.  Conbase: a software for unsupervised discovery of clonal somatic mutations in single cells through read phasing , 2018, Genome Biology.

[11]  Jian Ma,et al.  Allele-Specific Quantification of Structural Variations in Cancer Genomes , 2016, bioRxiv.

[12]  S. C. Sahinalp,et al.  ReMixT: clone-specific genomic structure estimation in cancer , 2017, Genome Biology.

[13]  Alexandre Bouchard-Cot'e,et al.  Blang: Bayesian declarative modelling of arbitrary data structures. , 2019 .

[14]  Russell Schwartz,et al.  Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations , 2013, Bioinform..

[15]  N. Carter,et al.  Estimation of rearrangement phylogeny for cancer genomes. , 2012, Genome research.

[16]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[17]  N. Navin,et al.  SiFit: A Method for Inferring Tumor Trees from Single-Cell Sequencing Data under Finite-site Models , 2016, bioRxiv.

[18]  Beverly A. Teicher,et al.  Cancer Drug Resistance , 2006 .

[19]  Benjamin J. Raphael,et al.  Reconstructing genome mixtures from partial adjacencies , 2012, BMC Bioinformatics.

[20]  A. M. Johansen,et al.  Towards Automatic Model Comparison: An Adaptive Sequential Monte Carlo Approach , 2013, 1303.3123.

[21]  Arnaud Doucet,et al.  Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme , 2019 .

[22]  J. Rueff,et al.  Cancer Drug Resistance , 2016, Methods in Molecular Biology.

[23]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[24]  Florian Markowetz,et al.  OncoNEM: inferring tumor evolution from single-cell sequencing data , 2016, Genome Biology.

[25]  Richard A. Moore,et al.  Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing , 2019, Cell.

[26]  Kieran R. Campbell,et al.  Single cell fitness landscapes induced by genetic and pharmacologic perturbations in cancer , 2020, bioRxiv.

[27]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[28]  Russell Schwartz,et al.  Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics , 2014, PLoS Comput. Biol..

[29]  S. C. Sahinalp,et al.  nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing , 2012, Genome research.

[30]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[31]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[32]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[33]  Bernard M. E. Moret,et al.  An investigation of phylogenetic likelihood methods , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[34]  Ken Chen,et al.  Monovar: single nucleotide variant detection in single cells , 2016, Nature Methods.

[35]  Anne-Mieke Vandamme,et al.  The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing , 2009 .

[36]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[37]  E. Letouzé,et al.  Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis , 2010, Genome Biology.

[38]  Minseok Kwon,et al.  Linked-read analysis identifies mutations in single-cell DNA-sequencing data , 2019, Nature Genetics.

[39]  A. Schäffer,et al.  The evolution of tumour phylogenetics: principles and practice , 2017, Nature Reviews Genetics.

[40]  Samuel Aparicio,et al.  Scalable whole-genome single-cell library preparation without preamplification , 2017, Nature Methods.

[41]  N. Beerenwinkel,et al.  Advances in understanding tumour evolution through single-cell sequencing* , 2017, Biochimica et biophysica acta. Reviews on cancer.