Inference of Cancer Progression Models with Biological Noise

Many applications in translational medicine require the understanding of how diseases progress through the accumulation of persistent events. Specialized Bayesian networks called monotonic progression networks offer a statistical framework for modeling this sort of phenomenon. Current machine learning tools to reconstruct Bayesian networks from data are powerful but not suited to progression models. We combine the technological advances in machine learning with a rigorous philosophical theory of causation to produce Polaris, a scalable algorithm for learning progression networks that accounts for causal or biological noise as well as logical relations among genetic events, making the resulting models easy to interpret qualitatively. We tested Polaris on synthetically generated data and showed that it outperforms a widely used machine learning algorithm and approaches the performance of the competing special-purpose, albeit clairvoyant algorithm that is given a priori information about the model parameters. We also prove that under certain rather mild conditions, Polaris is guaranteed to converge for sufficiently large sample sizes. Finally, we applied Polaris to point mutation and copy number variation data in Prostate cancer from The Cancer Genome Atlas (TCGA) and found that there are likely three distinct progressions, one major androgen driven progression, one major non-androgen driven progression, and one novel minor androgen driven progression.

[1]  C. Sander,et al.  Integrative genomic profiling of human prostate cancer. , 2010, Cancer cell.

[2]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[3]  Niko Beerenwinkel,et al.  Quantifying cancer progression with conjunctive Bayesian networks , 2009, Bioinform..

[4]  James Cussens,et al.  Advances in Bayesian Network Learning using Integer Programming , 2013, UAI.

[5]  Giancarlo Mauri,et al.  Inferring Tree Causal Models of Cancer Progression with Probability Raising , 2013, bioRxiv.

[6]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[7]  A. Sivachenko,et al.  Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer , 2012, Nature Genetics.

[8]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[9]  Giancarlo Mauri,et al.  Inferring causal models of cancer progression with a shrinkage estimator and probability raising , 2013 .

[10]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[11]  David Sontag,et al.  SparsityBoost: A New Scoring Function for Learning Bayesian Network Structure , 2013, UAI.

[12]  V. Beneš,et al.  Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. , 2013, Cancer cell.

[13]  Giancarlo Mauri,et al.  Efficient inference of cancer progression models , 2014 .

[14]  Arul M Chinnaiyan,et al.  Common gene rearrangements in prostate cancer. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[16]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[17]  Daniel Bottomly,et al.  Androgen Receptor Promotes Ligand-Independent Prostate Cancer Progression through c-Myc Upregulation , 2013, PloS one.

[18]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[19]  Benjamin J. Raphael,et al.  The Mutational Landscape of Lethal Castrate Resistant Prostate Cancer , 2016 .

[20]  Tobias Achterberg,et al.  SCIP: solving constraint integer programs , 2009, Math. Program. Comput..

[21]  J. Lagergren,et al.  Learning Oncogenetic Networks by Reducing to Mixed Integer Linear Programming , 2013, PloS one.

[22]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[23]  A. Sivachenko,et al.  Punctuated Evolution of Prostate Cancer Genomes , 2013, Cell.