Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV

BackgroundDisease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process.Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail.ResultsWe fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected.ConclusionsThe variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways.

[1]  J. Lagergren,et al.  Learning Oncogenetic Networks by Reducing to Mixed Integer Linear Programming , 2013, PloS one.

[2]  Y. Nakamura,et al.  Genetic alterations during colorectal-tumor development. , 1988, The New England journal of medicine.

[3]  Thomas Lengauer,et al.  Learning multiple evolutionary pathways from cross-sectional data , 2004, J. Comput. Biol..

[4]  Axel Benner,et al.  Oncogenetic tree modeling of human hepatocarcinogenesis , 2012, International journal of cancer.

[5]  P. Kleihues,et al.  Genetic pathways to primary and secondary glioblastoma. , 2007, The American journal of pathology.

[6]  B. Gunawan,et al.  Maximum likelihood estimation of oncogenetic tree models. , 2004, Biostatistics.

[7]  Giancarlo Mauri,et al.  CAPRI: Efficient Inference of Cancer Progression Models from Cross-sectional Data , 2014, bioRxiv.

[8]  Ali Tofigh,et al.  Using Trees to Capture Reticulate Evolution : Lateral Gene Transfers and Cancer Progression , 2009 .

[9]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[10]  Cr Sridhar,et al.  Handbook of Cancer Models with Applications , 2010 .

[11]  M. Fontes,et al.  Structural and Functional Studies of a Bothropic Myotoxin Complexed to Rosmarinic Acid: New Insights into Lys49-PLA2 Inhibition , 2011, PloS one.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Niko Beerenwinkel,et al.  Quantifying cancer progression with conjunctive Bayesian networks , 2009, Bioinform..

[14]  Niko Beerenwinkel,et al.  Construction of oncogenetic tree models reveals multiple pathways of oral cancer progression , 2009, International journal of cancer.

[15]  Thomas Lengauer,et al.  A method for finding consensus breakpoints in the cancer genome from copy number data , 2013, Bioinform..

[16]  Feng Jiang,et al.  Distance-Based Reconstruction of Tree Models for Oncogenesis , 2000, J. Comput. Biol..

[17]  A. O'Hagan,et al.  Gaussian process emulation of dynamic computer codes , 2009 .

[18]  J. Rahnenführer,et al.  Cumulative disease progression models for cross‐sectional data: A review and comparison , 2012, Biometrical journal. Biometrische Zeitschrift.

[19]  Jörg Rahnenführer,et al.  Clonal cytogenetic progression within intratumorally heterogeneous meningiomas predicts tumor recurrence. , 2011, International journal of oncology.

[20]  A. Schäffer,et al.  Construction of tree models for pathogenesis of nasopharyngeal carcinoma , 2004, Genes, chromosomes & cancer.

[21]  Thomas Lengauer,et al.  Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores , 2008, Bioinform..

[22]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[23]  Jens Lagergren,et al.  New Probabilistic Network Models and Algorithms for Oncogenesis , 2006, J. Comput. Biol..

[24]  Thomas Lengauer,et al.  Estimating HIV evolutionary pathways and the genetic barrier to drug resistance. , 2005, The Journal of infectious diseases.

[25]  Nicholas Eriksson,et al.  Conjunctive Bayesian networks , 2006, math/0608417.

[26]  Alejandro A Schäffer,et al.  Genetic differences detected by comparative genomic hybridization in head and neck squamous cell carcinomas from different tumor sites: construction of oncogenetic trees for tumor progression , 2002, Genes, chromosomes & cancer.

[27]  Camille Stephan-Otto Attolini,et al.  A mathematical framework to determine the temporal sequence of somatic genetic events in cancer , 2010, Proceedings of the National Academy of Sciences.

[28]  A. Schäffer,et al.  Graph models of oncogenesis with an application to melanoma. , 2001, Journal of theoretical biology.

[29]  A. Tsiatis,et al.  Statistical analysis of cytogenetic abnormalities in human cancer cells. , 1982, Cancer genetics and cytogenetics.

[30]  Giancarlo Mauri,et al.  Inferring Tree Causal Models of Cancer Progression with Probability Raising , 2013, bioRxiv.

[31]  Seth Sullivant,et al.  Markov models for accumulating mutations , 2007, 0709.2646.

[32]  B. Olsson,et al.  Deriving evolutionary tree models of the oncogenesis of endometrial adenocarcinoma , 2007, International journal of cancer.

[33]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[34]  Richard Desper,et al.  Construction and analysis of tree models for chromosomal classification of diffuse large B-cell lymphomas. , 2007, World journal of gastroenterology.

[35]  Thomas Lengauer,et al.  Stability analysis of mixtures of mutagenetic trees , 2008, BMC Bioinformatics.

[36]  Terri D. Pigott,et al.  Advances in Meta-Analysis , 2012 .

[37]  Laura Tolosi,et al.  Finding regions of aberrant DNA copy number associated with tumor phenotype , 2012 .

[38]  Nicholas Eriksson,et al.  The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis , 2011, PloS one.

[39]  A. Schäffer,et al.  Chromosome abnormalities in ovarian adenocarcinoma: III. Using breakpoint data to infer and test mathematical models for oncogenesis , 2000, Genes, chromosomes & cancer.

[40]  Lawrence Shih-Hsin Wu Construction of evolutionary tree models for nasopharyngeal carcinoma using comparative genomic hybridization data. , 2006, Cancer genetics and cytogenetics.

[41]  Feng Jiang,et al.  Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data , 1999, J. Comput. Biol..

[42]  Franziska Michor,et al.  A Mathematical Methodology for Determining the Temporal Order of Pathway Alterations Arising during Gliomagenesis , 2012, PLoS Comput. Biol..

[43]  L. S. Wu Construction of evolutionary tree models for nasopharyngeal carcinoma using comparative genomic hybridization data , 2006 .

[44]  A. Schäffer,et al.  Construction of evolutionary tree models for renal cell carcinoma from comparative genomic hybridization data. , 2000, Cancer research.