Algorithmic methods to infer the evolutionary trajectories in cancer progression

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next generation sequencing (NGS) data, and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional - omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent works on “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications as it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses. Statement of Significance: A causality based new machine learning Pipeline for Cancer Inference (PicNic) is introduced to infer the underlying somatic evolution of ensembles of tumors from next generation sequencing data. PicNic combines techniques for sample stratification, driver selection and identification of fitness-equivalent exclusive alterations to exploit a novel algorithm based on Suppes’ probabilistic causation. The accuracy and translational significance of the results are studied in details, with an application to colorectal cancer. PicNic pipeline has been made publicly accessible for reproducibility, interoperability and for future enhancements.

[1]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[2]  K. Boucher,et al.  Estimating an oncogenetic tree when false negatives and positives are present. , 2002, Mathematical biosciences.

[3]  Martin Vingron,et al.  Inferring the paths of somatic evolution in cancer , 2014, Bioinform..

[4]  R. Axelrod,et al.  Evolutionary Dynamics , 2004 .

[5]  L. Pusztai,et al.  Cancer heterogeneity: implications for targeted therapeutics , 2013, British Journal of Cancer.

[6]  J. Salk Clonal evolution in cancer , 2010 .

[7]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[8]  S. Ogino,et al.  Molecular classification and correlates in colorectal cancer. , 2008, The Journal of molecular diagnostics : JMD.

[9]  David Tamborero,et al.  Oncodrive-CIS: A Method to Reveal Likely Driver Genes Based on the Impact of Their Copy Number Changes on Expression , 2013, PloS one.

[10]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[11]  Junhyong Kim,et al.  The promise of single-cell sequencing , 2013, Nature Methods.

[12]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[13]  Yunshan Wang,et al.  FBXW7 negatively regulates ENO1 expression and function in colorectal cancer , 2015, Laboratory Investigation.

[14]  E. E. Gresch Genetic Alterations During Colorectal-Tumor Development , 1989 .

[15]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[16]  L. Pusztai,et al.  Gene expression profiling in breast cancer: classification, prognostication, and prediction , 2011, The Lancet.

[17]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[18]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[19]  F. Markowetz,et al.  Cancer Evolution: Mathematical Models and Computational Inference , 2014, Systematic biology.

[20]  Kenneth Pienta,et al.  APC/CTNNB1 (β‐catenin) pathway alterations in human prostate cancers , 2002 .

[21]  Shi-Hua Zhang,et al.  Efficient methods for identifying mutated driver pathways in cancer , 2012, Bioinform..

[22]  Peter A. Jones,et al.  A decade of exploring the cancer epigenome — biological and translational implications , 2011, Nature Reviews Cancer.

[23]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[24]  Niko Beerenwinkel,et al.  Modeling Mutual Exclusivity of Cancer Mutations , 2014, RECOMB.

[25]  I. Fidler,et al.  Tumor heterogeneity and the biology of cancer invasion and metastasis. , 1978, Cancer research.

[26]  Feng Jiang,et al.  Distance-Based Reconstruction of Tree Models for Oncogenesis , 2000, J. Comput. Biol..

[27]  Nilgun Donmez,et al.  Clonality inference in multiple tumor samples using phylogeny , 2015, Bioinform..

[28]  Michael C. Schatz,et al.  Interactive analysis and assessment of single-cell copy-number variations , 2015, Nature Methods.

[29]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[30]  S. Kauffman,et al.  Cancer attractors: a systems view of tumors from a gene network dynamics and developmental perspective. , 2009, Seminars in cell & developmental biology.

[31]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[32]  C. Sander,et al.  Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations , 2014, Genome Biology.

[33]  Nicholas Eriksson,et al.  Conjunctive Bayesian networks , 2006, math/0608417.

[34]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[35]  G. Parmigiani,et al.  Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses , 2008, Science.

[36]  C. Yeang,et al.  Combinatorial patterns of somatic gene mutations in cancer , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[37]  Nuria Lopez-Bigas,et al.  IntOGen: integration and data mining of multidimensional oncogenomic data , 2010, Nature Methods.

[38]  Matthew B. Callaway,et al.  MuSiC: Identifying mutational significance in cancer genomes , 2012, Genome research.

[39]  Benjamin J. Raphael,et al.  The Mutational Landscape of Lethal Castrate Resistant Prostate Cancer , 2016 .

[40]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[41]  Shankar Vembu,et al.  Inferring clonal evolution of tumors from single nucleotide somatic mutations , 2012, BMC Bioinformatics.

[42]  Benjamin J. Raphael,et al.  CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer , 2015, Genome Biology.

[43]  H. Gralnick,et al.  Proposals for the Classification of the Acute Leukaemias French‐American‐British (FAB) Co‐operative Group , 1976, British journal of haematology.

[44]  Giancarlo Mauri,et al.  TRONCO: an R package for the inference of cancer progression models from heterogeneous genomic data , 2015 .

[45]  M. Bertagnolli,et al.  Molecular origins of cancer: Molecular basis of colorectal cancer. , 2009, The New England journal of medicine.

[46]  Jianxin Shi,et al.  MEGSA: A powerful and flexible framework for analyzing mutual exclusivity of tumor mutations , 2015, bioRxiv.

[47]  Charles Swanton,et al.  Genetic prognostic and predictive markers in colorectal cancer , 2009, Nature Reviews Cancer.

[48]  Sergio Alonso,et al.  DNA methylation alterations of AXIN2 in serrated adenomas and colon carcinomas with microsatellite instability , 2014, BMC Cancer.

[49]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[50]  R. Weinberg,et al.  The Biology of Cancer , 2006 .

[51]  Y. Nakamura,et al.  Genetic alterations during colorectal-tumor development. , 1988, The New England journal of medicine.

[52]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[53]  Thomas Lengauer,et al.  Learning multiple evolutionary pathways from cross-sectional data , 2004, J. Comput. Biol..

[54]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[55]  M. Nowak,et al.  The linear process of somatic evolution , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[56]  A. McCullough,et al.  Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing , 2013 .

[57]  Timothy M. Pawlik,et al.  Colorectal Carcinogenesis: MSI-H Versus MSI-L , 2004, Disease markers.

[58]  Feng Jiang,et al.  Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data , 1999, J. Comput. Biol..

[59]  Kieran Sheahan,et al.  Targeting EGFR in metastatic colorectal cancer beyond the limitations of KRAS status: alternative biomarkers and therapeutic strategies. , 2015, Biomarkers in medicine.

[60]  Niko Beerenwinkel,et al.  Quantifying cancer progression with conjunctive Bayesian networks , 2009, Bioinform..

[61]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[62]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[63]  Benjamin J. Raphael,et al.  Reconstruction of clonal trees and tumor composition from multi-sample sequencing data , 2015, Bioinform..

[64]  Christopher J. R. Illingworth,et al.  High-Definition Reconstruction of Clonal Composition in Cancer , 2014, Cell reports.

[65]  A. Bouchard-Côté,et al.  PyClone: statistical inference of clonal population structure in cancer , 2014, Nature Methods.

[66]  Giancarlo Mauri,et al.  TRONCO: an R package for the inference of cancer progression models from heterogeneous genomic data , 2015, bioRxiv.

[67]  C. Bokemeyer,et al.  Fluorouracil, leucovorin, and oxaliplatin with and without cetuximab in the first-line treatment of metastatic colorectal cancer. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[68]  Giancarlo Mauri,et al.  Inferring Tree Causal Models of Cancer Progression with Probability Raising , 2013, bioRxiv.

[69]  D. Dexter,et al.  Heterogeneity of tumor cells from a single mouse mammary tumor. , 1978, Cancer research.

[70]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[71]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[72]  Benjamin J. Raphael,et al.  Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data , 2014, Bioinform..

[73]  P. Raynaud,et al.  MiniSOX9, a dominant-negative variant in colon cancer cells , 2011, Oncogene.

[74]  R. Bresalier,et al.  The Biology of Cancer. Garland Science, Oxford, UK (2006), 864 pp. $104.95, ISBN: 0-8153-4076-1. Web site for ordering: www.garandscience.com , 2007 .

[75]  Eli Upfal,et al.  De Novo Discovery of Mutated Driver Pathways in Cancer , 2011, RECOMB.

[76]  S Ramchandani,et al.  DNA methylation is a reversible biological signal. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Thomas Lengauer,et al.  Learning Multiple Evolutionary Pathways from Cross-Sectional Data , 2005, J. Comput. Biol..

[78]  E. Lander,et al.  Lessons from the Cancer Genome , 2013, Cell.

[79]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[80]  Bud Mishra,et al.  Image Analysis and Length Estimation of Biomolecules Using AFM , 2012, IEEE Transactions on Information Technology in Biomedicine.

[81]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[82]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[83]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[84]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[85]  C. Maley,et al.  Cancer is a disease of clonal evolution within the body1–3. This has profound clinical implications for neoplastic progression, cancer prevention and cancer therapy. Although the idea of cancer as an evolutionary problem , 2006 .

[86]  Paul Walton Purdom,et al.  Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees , 2000, Bioinform..

[87]  Obi L. Griffith,et al.  SciClone: Inferring Clonal Architecture and Tracking the Spatial and Temporal Patterns of Tumor Evolution , 2014, PLoS Comput. Biol..

[88]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[89]  B. Karlan,et al.  Gene expression profile of BRCAness that correlates with responsiveness to chemotherapy and with outcome in patients with epithelial ovarian cancer. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[90]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[91]  N. McGranahan,et al.  The causes and consequences of genetic heterogeneity in cancer evolution , 2013, Nature.

[92]  Bud Mishra,et al.  Identifying individual DNA species in a complex mixture by precisely measuring the spacing between nicking restriction enzymes with atomic force microscope , 2012, Journal of The Royal Society Interface.

[93]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[94]  A. Zorn,et al.  Interactions between SOX factors and Wnt/β‐catenin signaling in development and disease , 2009, Developmental dynamics : an official publication of the American Association of Anatomists.

[95]  Charles M. Perou Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013 .

[96]  Min Sung Kim,et al.  Frameshift mutations of Wnt pathway genes AXIN2 and TCF7L2 in gastric carcinomas with high microsatellite instability. , 2009, Human pathology.

[97]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[98]  Camille Stephan-Otto Attolini,et al.  A mathematical framework to determine the temporal sequence of somatic genetic events in cancer , 2010, Proceedings of the National Academy of Sciences.

[99]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[100]  N. Navin Cancer genomics: one cell at a time , 2014, Genome Biology.

[101]  Andreas Witzel,et al.  Cancer hybrid automata: Model, beliefs and therapy , 2014, Inf. Comput..

[102]  A. Gonzalez-Perez,et al.  Functional impact bias reveals cancer drivers , 2012, Nucleic acids research.

[103]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[104]  A. McCullough Comprehensive molecular characterization of human colon and rectal cancer , 2013 .

[105]  F T Bosman,et al.  Prognosis of stage II and III colon cancer treated with adjuvant 5-fluorouracil or FOLFIRI in relation to microsatellite status: results of the PETACC-3 trial. , 2015, Annals of oncology : official journal of the European Society for Medical Oncology.

[106]  Gyeong Hoon Kang,et al.  Molecular and prognostic heterogeneity of microsatellite-unstable colorectal cancer. , 2014, World journal of gastroenterology.

[107]  M. Washington,et al.  PIK3CA and APC mutations are synergistic in the development of intestinal cancers , 2014, Oncogene.

[108]  B. Vogelstein,et al.  A genetic model for colorectal tumorigenesis , 1990, Cell.

[109]  Giancarlo Mauri,et al.  CAPRI: Efficient Inference of Cancer Progression Models from Cross-sectional Data , 2014, bioRxiv.

[110]  Rohini Khatri,et al.  Sequential expression of miR‐182 and miR‐503 cooperatively targets FBXW7, contributing to the malignant transformation of colon adenoma to adenocarcinoma , 2014, The Journal of pathology.

[111]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[112]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[113]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[114]  P. A. Futreal,et al.  Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing , 2014, Nature Genetics.

[115]  Roded Sharan,et al.  Simultaneous Identification of Multiple Driver Pathways in Cancer , 2013, PLoS Comput. Biol..

[116]  Lianxin Liu,et al.  Abstract 1957: Negative regulation of Sox9 by glycogen synthase kinase 3 beta phosphorylation and SCFFbw7-dependent ubiquitination in cancer , 2015 .

[117]  S. Gruber,et al.  Microsatellite instability in colorectal cancer—the stable evidence , 2010, Nature Reviews Clinical Oncology.

[118]  Yu Shyr,et al.  Network-based stratification analysis of 13 major cancer types using mutations in panels of cancer genes , 2015, BMC Genomics.

[119]  Jianxin Shi,et al.  MEGSA: A powerful and flexible framework for analyzing mutual exclusivity of tumor mutations , 2015, bioRxiv.

[120]  Manuel Serrano,et al.  Oncogenicity of the developmental transcription factor Sox9. , 2012, Cancer research.

[121]  M. Sporn,et al.  The tumour microenvironment as a target for chemoprevention , 2007, Nature Reviews Cancer.

[122]  Benjamin J. Raphael,et al.  THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data , 2013, Genome Biology.

[123]  Junfeng Wang,et al.  Inferring Clonal Composition from Multiple Sections of a Breast Cancer , 2014, PLoS Comput. Biol..

[124]  J. Warusavitarne,et al.  The role of chemotherapy in microsatellite unstable (MSI-H) colorectal cancer , 2007, International Journal of Colorectal Disease.

[125]  Christopher A. Miller,et al.  Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors , 2011, BMC Medical Genomics.

[126]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.