Inferring models of multiscale copy number evolution for single-tumor phylogenetics

Motivation: Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and individual patients. Previous work on inferring phylogenies of single tumors by copy number evolution assumed models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models. Results: We propose a framework for inferring models of tumor progression from single-cell gene copy number data, including variable rates for different gain and loss events. We propose a new algorithm for identification of most parsimonious combinations of single gene and single chromosome events. We extend it via dynamic programming to include genome duplications. We implement an expectation maximization (EM)-like method to estimate mutation-specific and tumor-specific event rates concurrently with tree reconstruction. Application of our algorithms to real cervical cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for the metastasis of primary cervical cancers and for tongue cancer survival. Availability and implementation: Our software (FISHtrees) and two datasets are available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees. Contact: russells@andrew.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  H. Bandelt,et al.  Median-joining networks for inferring intraspecific phylogenies. , 1999, Molecular biology and evolution.

[2]  L. Pusztai,et al.  Cancer heterogeneity: implications for targeted therapeutics , 2013, British Journal of Cancer.

[3]  Russell Schwartz,et al.  Reconstructing Tumor phylogenies from Heterogeneous Single-Cell Data , 2007, J. Bioinform. Comput. Biol..

[4]  H. Lehrach,et al.  Somatic Mutation Profiles of MSI and MSS Colorectal Cancer Identified by Whole Exome Next Generation Sequencing and Bioinformatics Analysis , 2010, PloS one.

[5]  Miklós Csuös,et al.  Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood , 2010, Bioinform..

[6]  Henry H. Heng,et al.  Chromosomal instability (CIN): what it is and why it is crucial to cancer evolution , 2013, Cancer and Metastasis Reviews.

[7]  Paul T. Spellman,et al.  Methods and challenges in timing chromosomal abnormalities within cancer samples , 2013, Bioinform..

[8]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[9]  Russell Schwartz,et al.  Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics , 2014, PLoS Comput. Biol..

[10]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[11]  Jörg Rahnenführer,et al.  Clonal cytogenetic progression within intratumorally heterogeneous meningiomas predicts tumor recurrence. , 2011, International journal of oncology.

[12]  C. Maley,et al.  Accurate Reconstruction of the Temporal Order of Mutations in Neoplastic Progression , 2011, Cancer Prevention Research.

[13]  Sanjay Ranka,et al.  Inferring progression models for CGH data , 2009, Bioinform..

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Feng Jiang,et al.  Distance-Based Reconstruction of Tree Models for Oncogenesis , 2000, J. Comput. Biol..

[16]  Richard M. Karp,et al.  A simple derivation of Edmonds' algorithm for optimum branchings , 1971, Networks.

[17]  C. Harris,et al.  Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis. , 1994, Cancer research.

[18]  M. Newton Discovering Combinations of Genomic Aberrations Associated With Cancer , 2002 .

[19]  C. Greenman Estimation of Rearrangement Phylogeny in Cancer , 2012 .

[20]  A. Schäffer,et al.  Tumorigenesis and Neoplastic Progression Single-Cell Genetic Analysis of Ductal Carcinoma in Situ and Invasive Breast Cancer Reveals Enormous Tumor Heterogeneity yet Conserved Genomic Imbalances and Gain of MYC during Progression , 2012 .

[21]  Russell Schwartz,et al.  Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations , 2013, Bioinform..

[22]  N. Carter,et al.  Estimation of rearrangement phylogeny for cancer genomes. , 2012, Genome research.

[23]  Reuben S Harris,et al.  RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. , 2002, Molecular cell.

[24]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[25]  A. Schäffer,et al.  Fluorescence in situ hybridization markers for prediction of cervical lymph node metastases. , 2009, The American journal of pathology.

[26]  K. Polyak,et al.  Tumor heterogeneity: causes and consequences. , 2010, Biochimica et biophysica acta.

[27]  Feng Jiang,et al.  Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data , 1999, J. Comput. Biol..

[28]  F. Markowetz,et al.  Cancer Evolution: Mathematical Models and Computational Inference , 2014, Systematic biology.

[29]  M. Neuberger,et al.  Molecular mechanisms of antibody somatic hypermutation. , 2007, Annual review of biochemistry.

[30]  Jussi Taipale,et al.  Transcription factor PROX1 induces colon cancer progression by promoting the transition from benign to highly dysplastic phenotype. , 2008, Cancer cell.

[31]  M. Beaumont Approximate Bayesian Computation in Evolution and Ecology , 2010 .

[32]  Debyani Chakravarty,et al.  Intratumoral heterogeneity of receptor tyrosine kinases EGFR and PDGFRA amplification in glioblastoma defines subpopulations with distinct growth factor response , 2012, Proceedings of the National Academy of Sciences.

[33]  Vasileia Damaskou,et al.  Tumour expression of lymphangiogenic growth factors but not lymphatic vessel density is implicated in human cervical cancer progression , 2010, Pathology.

[34]  Jens Lagergren,et al.  New Probabilistic Network Models and Algorithms for Oncogenesis , 2006, J. Comput. Biol..

[35]  Rebecca A Betensky,et al.  Mosaic amplification of multiple receptor tyrosine kinase genes in glioblastoma. , 2011, Cancer cell.

[36]  M. Gönen,et al.  Evolutionary pathways in BRCA1-associated breast tumors. , 2012, Cancer discovery.

[37]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[38]  L. Loeb,et al.  Mutator phenotype may be required for multistage carcinogenesis. , 1991, Cancer research.