Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach

Most of the evolutionary history reconstruction approaches are based on the infinite sites assumption, which states that mutations appear once in the evolutionary history. The Perfect Phylogeny model is the result of the infinite sites assumption and has been widely used to infer cancer evolution. Nonetheless, recent results show that recurrent and back mutations are present in the evolutionary history of tumors, hence the Perfect Phylogeny model might be too restrictive. We propose an approach that allows losing previously acquired mutations and multiple acquisitions of a character. Moreover, we provide an ILP formulation for the evolutionary tree reconstruction problem. Our formulation allows us to tackle both the Incomplete Directed Phylogeny problem and the Clonal Reconstruction problem when general evolutionary models are considered. The latter problem is fundamental in cancer genomics, the goal is to study the evolutionary history of a tumor considering as input data the fraction of cells having a certain mutation in a set of cancer samples. For the Clonal Reconstruction problem, an experimental analysis shows the advantage of allowing mutation losses. Namely, by analyzing real and simulated datasets, our ILP approach provides a better interpretation of the evolutionary history than a Perfect Phylogeny. The software is at https://github.com/AlgoLab/gppf.

[1]  Nilgun Donmez,et al.  Clonality inference in multiple tumor samples using phylogeny , 2015, Bioinform..

[2]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[3]  Dan Gusfield,et al.  Persistent phylogeny: a galled-tree and integer linear programming approach , 2015, BCB.

[4]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[5]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[6]  Paola Bonizzoni,et al.  The binary perfect phylogeny with persistent characters , 2011, Theor. Comput. Sci..

[7]  Shankar Vembu,et al.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors , 2015, Genome Biology.

[8]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  Benjamin J. Raphael,et al.  Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. , 2016, Cell systems.

[10]  Mike A. Steel,et al.  Phylogeny - discrete and random processes in evolution , 2016, CBMS-NSF regional conference series in applied mathematics.

[11]  Benjamin J. Raphael,et al.  Advances for studying clonal evolution in cancer. , 2013, Cancer letters.

[12]  Jenny Taylor,et al.  Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. , 2012, Blood.

[13]  Benjamin J. Raphael,et al.  Reconstruction of clonal trees and tumor composition from multi-sample sequencing data , 2015, Bioinform..

[14]  N. Navin Cancer genomics: one cell at a time , 2014, Genome Biology.

[15]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[16]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .

[17]  Nancy R. Zhang,et al.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing , 2016, Proceedings of the National Academy of Sciences.

[18]  Dumitru Brinza,et al.  An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads , 2012, BCB.

[19]  Shankar Vembu,et al.  Inferring clonal evolution of tumors from single nucleotide somatic mutations , 2012, BMC Bioinformatics.

[20]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[21]  Faraz Hach,et al.  CLIIQ: Accurate Comparative Detection and Quantification of Expressed Isoforms in a Population , 2012, WABI.

[22]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .

[23]  Jack Kuipers,et al.  Tree inference for single-cell data , 2016 .

[24]  Dannie Durand,et al.  Graph Theoretical Insights into Evolution of Multidomain Proteins , 2005, RECOMB.

[25]  Ron Shamir,et al.  Incomplete Directed Perfect Phylogeny , 2000, CPM.

[26]  P. A. Futreal,et al.  Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing , 2014, Nature Genetics.

[27]  Jack Kuipers,et al.  Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors , 2017, Genome research.

[28]  Paola Bonizzoni,et al.  A colored graph approach to perfect phylogeny with persistent characters , 2017, Theor. Comput. Sci..

[29]  Daniel G. Brown,et al.  Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data , 2007, COCOON.

[30]  Paola Bonizzoni,et al.  When and How the Perfect Phylogeny Model Explains Evolution , 2014, Discrete and Topological Models in Molecular Biology.

[31]  Iman Hajirasouliha,et al.  A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data , 2014, Bioinform..

[32]  Paola Bonizzoni,et al.  Explaining evolution via constrained persistent perfect phylogeny , 2014, BMC Genomics.

[33]  L. Michaux,et al.  14q deletions are associated with trisomy 12, NOTCH1 mutations and unmutated IGHV genes in chronic lymphocytic leukemia and small lymphocytic lymphoma , 2014, Genes, chromosomes & cancer.

[34]  Carlo C. Maley,et al.  Clonal evolution in cancer , 2012, Nature.