Inferring tree causal models of cancer progression with probability raising

Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.

[1]  Nicholas Eriksson,et al.  The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis , 2011, PloS one.

[2]  L. Watson,et al.  Reverse engineering dynamic temporal models of biological processes and their relationships , 2010, Proceedings of the National Academy of Sciences.

[3]  Y. Nakamura,et al.  Genetic alterations during colorectal-tumor development. , 1988, The New England journal of medicine.

[4]  Thomas Lengauer,et al.  Learning multiple evolutionary pathways from cross-sectional data , 2004, J. Comput. Biol..

[5]  K. Sirotkin,et al.  The interactive online SKY/M‐FISH & CGH Database and the Entrez Cancer Chromosomes search database: Linkage of chromosomal aberrations with the genome sequence , 2005, Genes, chromosomes & cancer.

[6]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[7]  N. Cartwright Causal Laws and Effective Strategies , 1979 .

[8]  Feng Jiang,et al.  Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data , 1999, J. Comput. Biol..

[9]  Cr Sridhar,et al.  Handbook of Cancer Models with Applications , 2010 .

[10]  Steven A. Frank,et al.  Dynamics of Cancer , 2007 .

[11]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[12]  Alejandro A Schäffer,et al.  Genetic differences detected by comparative genomic hybridization in head and neck squamous cell carcinomas from different tumor sites: construction of oncogenetic trees for tumor progression , 2002, Genes, chromosomes & cancer.

[13]  Ji Luo,et al.  Principles of Cancer Therapy: Oncogene and Non-oncogene Addiction , 2009, Cell.

[14]  K. Boucher,et al.  Estimating an oncogenetic tree when false negatives and positives are present. , 2002, Mathematical biosciences.

[15]  B. Gunawan,et al.  Maximum likelihood estimation of oncogenetic tree models. , 2004, Biostatistics.

[16]  J. Rahnenführer,et al.  Cumulative disease progression models for cross‐sectional data: A review and comparison , 2012, Biometrical journal. Biometrische Zeitschrift.

[17]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[18]  Roberta Spinelli,et al.  Recurrent SETBP1 mutations in atypical chronic myeloid leukemia , 2012, Nature Genetics.

[19]  Feng Jiang,et al.  Distance-Based Reconstruction of Tree Models for Oncogenesis , 2000, J. Comput. Biol..

[20]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[21]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[22]  A. Gupta,et al.  Extracting Dynamics from Static Cancer Expression Data , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Samantha Kleinberg,et al.  Causality, Probability, and Time , 2012 .

[24]  Axel Benner,et al.  Oncogenetic tree modeling of human hepatocarcinogenesis , 2012, International journal of cancer.

[25]  E. Gillanders,et al.  Somatic deletions in hereditary breast cancers implicate 13q21 as a putative novel breast cancer susceptibility locus. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Thomas Lengauer,et al.  Mtreemix: a software package for learning and using mixture models of mutagenetic trees , 2005, Bioinform..

[27]  Bud Mishra,et al.  Mapping tumor-suppressor genes with multipoint statistics from copy-number-variation data. , 2006, American journal of human genetics.

[28]  E. Samuelson,et al.  BAC CGH-array identified specific small-scale genomic imbalances in diploid DMBA-induced rat mammary tumors , 2012, BMC Cancer.

[29]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[30]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[31]  Claudia Baier Direction Of Time , 2016 .

[32]  Niko Beerenwinkel,et al.  Quantifying cancer progression with conjunctive Bayesian networks , 2009, Bioinform..

[33]  Niko Beerenwinkel,et al.  Construction of oncogenetic tree models reveals multiple pathways of oral cancer progression , 2009, International journal of cancer.

[34]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[35]  Michael Wigler,et al.  A cluster of cooperating tumor-suppressor gene candidates in chromosomal deletions , 2012, Proceedings of the National Academy of Sciences.

[36]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[37]  Angela N. Brooks,et al.  Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing , 2012, Cell.

[38]  F. Haller,et al.  An oncogenetic tree model in gastrointestinal stromal tumours (GISTs) identifies different pathways of cytogenetic evolution with prognostic implications , 2007, The Journal of pathology.

[39]  Jens Lagergren,et al.  New Probabilistic Network Models and Algorithms for Oncogenesis , 2006, J. Comput. Biol..

[40]  Nicholas Eriksson,et al.  Conjunctive Bayesian networks , 2006, math/0608417.

[41]  P. Kleingeld,et al.  The Stanford Encyclopedia of Philosophy , 2013 .

[42]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[43]  A. Schäffer,et al.  Graph models of oncogenesis with an application to melanoma. , 2001, Journal of theoretical biology.