gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data

Background Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. Results We present a new tool, gpps, that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gpps. Conclusions gpps provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.

[1]  David S. Johnson,et al.  The computational complexity of inferring rooted phylogenies by parsimony , 1986 .

[2]  Sampath Kannan,et al.  Hen's Teeth and Whale's Feet: Generalized Characters and Their Compatibility , 1995, J. Comput. Biol..

[3]  W. Koh,et al.  Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics , 2014, Proceedings of the National Academy of Sciences.

[4]  Giulio Caravagna,et al.  Learning mutational graphs of individual tumor evolution from multi-sample sequencing data , 2017, bioRxiv.

[5]  Mauricio Soto Gomez,et al.  Character-Based Phylogeny Construction and Its Application to Tumor Evolution , 2017, CiE.

[6]  Florian Markowetz,et al.  A phylogenetic latent feature model for clonal deconvolution , 2016, 1604.01715.

[7]  Ken Chen,et al.  SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models , 2017, Genome Biology.

[8]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[9]  Y. Kluger,et al.  TrAp: a tree approach for fingerprinting subclonal tumor composition , 2013, Nucleic acids research.

[10]  Yufeng Wu,et al.  Accurate and Efficient Cell Lineage Tree Inference from Noisy Single Cell Data: the Maximum Likelihood Perfect Phylogeny Approach , 2019, bioRxiv.

[11]  Jack Kuipers,et al.  Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors , 2017, Genome research.

[12]  Paola Bonizzoni,et al.  A colored graph approach to perfect phylogeny with persistent characters , 2017, Theor. Comput. Sci..

[13]  Lajos Pusztai,et al.  Phylogenetic analysis of metastatic progression in breast cancer using somatic mutations and copy number aberrations , 2017, Nature Communications.

[14]  Matthias Blum,et al.  miRmap web: comprehensive microRNA target prediction online , 2013, Nucleic Acids Res..

[15]  Faraz Hach,et al.  PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data , 2019, Genome Research.

[16]  Paola Bonizzoni,et al.  Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses , 2018, bioRxiv.

[17]  Paola Bonizzoni,et al.  The binary perfect phylogeny with persistent characters , 2011, Theor. Comput. Sci..

[18]  Nilgun Donmez,et al.  Clonality inference in multiple tumor samples using phylogeny , 2015, Bioinform..

[19]  Mohammed El-Kebir,et al.  SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error , 2018, Bioinform..

[20]  Igor B. Rogozin,et al.  Dollo parsimony and the reconstruction of genome evolution , 2006 .

[21]  K. Weinberg,et al.  Gene correction for SCID-X1 in long-term hematopoietic stem cells , 2018, Nature Communications.

[22]  Jianyi Yang,et al.  Protein contact prediction using metagenome sequence data and residual neural networks , 2020, Bioinform..

[23]  Iman Hajirasouliha,et al.  A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data , 2014, Bioinform..

[24]  Benjamin J. Raphael,et al.  Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. , 2016, Cell systems.

[25]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[26]  Benjamin J. Raphael,et al.  Tumor phylogeny inference using tree-constrained importance sampling , 2017, Bioinform..

[27]  Daniel G. Brown,et al.  Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data , 2007, COCOON.

[28]  Alexandre Bouchard-Côté,et al.  ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data , 2017, Genome Biology.

[29]  Paola Bonizzoni,et al.  Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Sampath Kannan,et al.  Of Chicken Teeth and Mouse Eyes, or Generalized Character Compatibility , 1995, CPM.

[31]  Shankar Vembu,et al.  Inferring clonal evolution of tumors from single nucleotide somatic mutations , 2012, BMC Bioinformatics.

[32]  Paola Bonizzoni,et al.  Beyond Perfect Phylogeny: Multisample Phylogeny Reconstruction via ILP , 2017, BCB.

[33]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[34]  Ron Shamir,et al.  Incomplete Directed Perfect Phylogeny , 2000, CPM.

[35]  P. Deloukas,et al.  Signatures of mutation and selection in the cancer genome , 2010, Nature.

[36]  Paola Bonizzoni,et al.  Explaining evolution via constrained persistent perfect phylogeny , 2014, BMC Genomics.

[37]  Jack Kuipers,et al.  Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data , 2017, Nature Communications.

[38]  Shan-hui Hsu,et al.  Substrate-dependent gene regulation of self-assembled human MSC spheroids on chitosan membranes , 2013, BMC Genomics.

[39]  Dan Gusfield,et al.  Persistent phylogeny: a galled-tree and integer linear programming approach , 2015, BCB.

[40]  Huanming Yang,et al.  Single-Cell Exome Sequencing and Monoclonal Evolution of a JAK2-Negative Myeloproliferative Neoplasm , 2012, Cell.

[41]  Florian Markowetz,et al.  OncoNEM: inferring tumor evolution from single-cell sequencing data , 2016, Genome Biology.

[42]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[43]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[44]  Niko Beerenwinkel,et al.  BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies , 2015, Genome Biology.

[45]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[46]  Iman Hajirasouliha,et al.  Fast and scalable inference of multi-sample cancer lineages , 2014, Genome Biology.