CAPRI: Efficient Inference of Cancer Progression Models from Cross-sectional Data

We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. Motivation Several cancer-related genomic data have become available (e.g., The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis. Our goal is to infer cancer “progression” models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of “selectivity” relations, where a mutation in a gene A “selects” for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. Results The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy, and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data. We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia, in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events. Availability CAPRI is part of the TRanslational ONCOlogy R package and is freely available on the web at: http://bimib.disco.unimib.it/index.php/Tronco Contact daniele.ramazzotti@disco.unimib.it

[1]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[2]  K. Boucher,et al.  Estimating an oncogenetic tree when false negatives and positives are present. , 2002, Mathematical biosciences.

[3]  Martin Vingron,et al.  Inferring the paths of somatic evolution in cancer , 2014, Bioinform..

[4]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[5]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[6]  Omar Abdel-Wahab,et al.  SETBP1 Mutations Drive Leukemic Transformation in ASXL1-Mutated MDS , 2014, Leukemia.

[7]  Alexandra M. Carvalho,et al.  Scoring functions for learning Bayesian networks , 2009 .

[8]  Iannis Aifantis,et al.  ASXL1 mutations promote myeloid transformation through loss of PRC2-mediated gene repression. , 2012, Cancer cell.

[9]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[10]  Chieh-Yu Liu,et al.  SF3B1 mutations in patients with myelodysplastic syndromes: The mutation is stable during disease evolution , 2014, American journal of hematology.

[11]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[12]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[13]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[14]  P. Kleingeld,et al.  The Stanford Encyclopedia of Philosophy , 2013 .

[15]  Junhyong Kim,et al.  Reconstructing the Temporal Ordering of Biological Samples Using Microarray Data , 2003, Bioinform..

[16]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[17]  Y. Nakamura,et al.  Genetic alterations during colorectal-tumor development. , 1988, The New England journal of medicine.

[18]  Thomas Lengauer,et al.  Learning multiple evolutionary pathways from cross-sectional data , 2004, J. Comput. Biol..

[19]  C Haferlach,et al.  SETBP1 mutations occur in 9% of MDS/MPN and in 4% of MPN cases and are strongly associated with atypical CML, monosomy 7, isochromosome i(17)(q10), ASXL1 and CBL mutations , 2013, Leukemia.

[20]  Giancarlo Mauri,et al.  Inferring Tree Causal Models of Cancer Progression with Probability Raising , 2013, bioRxiv.

[21]  Feng Jiang,et al.  Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data , 1999, J. Comput. Biol..

[22]  Franziska Michor,et al.  A Mathematical Methodology for Determining the Temporal Order of Pathway Alterations Arising during Gliomagenesis , 2012, PLoS Comput. Biol..

[23]  A. Gupta,et al.  Extracting Dynamics from Static Cancer Expression Data , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Samantha Kleinberg,et al.  Causality, Probability, and Time , 2012 .

[25]  Giancarlo Mauri,et al.  Implementation of the TRONCO package for TRanslational ONCOlogy , 2015 .

[26]  C. Maley,et al.  Cancer is a disease of clonal evolution within the body1–3. This has profound clinical implications for neoplastic progression, cancer prevention and cancer therapy. Although the idea of cancer as an evolutionary problem , 2006 .

[27]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[28]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[29]  F. Markowetz,et al.  Cancer Evolution: Mathematical Models and Computational Inference , 2014, Systematic biology.

[30]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[31]  Camille Stephan-Otto Attolini,et al.  A mathematical framework to determine the temporal sequence of somatic genetic events in cancer , 2010, Proceedings of the National Academy of Sciences.

[32]  S. Kauffman,et al.  Cancer attractors: a systems view of tumors from a gene network dynamics and developmental perspective. , 2009, Seminars in cell & developmental biology.

[33]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[34]  Daniel Birnbaum,et al.  Mutations of polycomb‐associated gene ASXL1 in myelodysplastic syndromes and chronic myelomonocytic leukaemia , 2009, British journal of haematology.

[35]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[36]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[37]  Niko Beerenwinkel,et al.  Quantifying cancer progression with conjunctive Bayesian networks , 2009, Bioinform..

[38]  H. Aburatani,et al.  Concurrent loss of Ezh2 and Tet2 cooperates in the pathogenesis of myelodysplastic disorders , 2013, The Journal of experimental medicine.

[39]  T. Hampton,et al.  The Cancer Genome Atlas , 2020, Indian Journal of Medical and Paediatric Oncology.

[40]  K. Sirotkin,et al.  The interactive online SKY/M‐FISH & CGH Database and the Entrez Cancer Chromosomes search database: Linkage of chromosomal aberrations with the genome sequence , 2005, Genes, chromosomes & cancer.

[41]  J. Licht,et al.  Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. , 2010, Cancer cell.

[42]  Roberta Spinelli,et al.  Recurrent SETBP1 mutations in atypical chronic myeloid leukemia , 2012, Nature Genetics.

[43]  Feng Jiang,et al.  Distance-Based Reconstruction of Tree Models for Oncogenesis , 2000, J. Comput. Biol..

[44]  Gary D Bader,et al.  Comprehensive identification of mutational cancer driver genes across 12 tumor types , 2013, Scientific Reports.

[45]  Jens Lagergren,et al.  New Probabilistic Network Models and Algorithms for Oncogenesis , 2006, J. Comput. Biol..

[46]  Nicholas Eriksson,et al.  Conjunctive Bayesian networks , 2006, math/0608417.