Inferring Sequential Order of Somatic Mutations during Tumorgenesis based on Markov Chain Model

Tumors are developed and worsen with the accumulated mutations on DNA sequences during tumorigenesis. Identifying the temporal order of gene mutations in cancer initiation and development is a challenging topic. It not only provides a new insight into the study of tumorigenesis at the level of genome sequences but also is an effective tool for early diagnosis of tumors and preventive medicine. In this paper, we develop a novel method to accurately estimate the sequential order of gene mutations during tumorigenesis from genome sequencing data based on Markov chain model as TOMC (Temporal Order based on Markov Chain), and also provide a new criterion to further infer the order of samples or patients, which can characterize the severity or stage of the disease. We applied our method to the analysis of tumors based on several high-throughput datasets. Specifically, first, we revealed that tumor suppressor genes (TSG) tend to be mutated ahead of oncogenes, which are considered as important events for key functional loss and gain during tumorigenesis. Second, the comparisons of various methods demonstrated that our approach has clear advantages over the existing methods due to the consideration on the effect of mutation dependence among genes, such as co-mutation. Third and most important, our method is able to deduce the ordinal sequence of patients or samples to quantitatively characterize their severity of tumors. Therefore, our work provides a new way to quantitatively understand the development and progression of tumorigenesis based on high throughput sequencing data.

[1]  N. Schork,et al.  Identification of rare cancer driver mutations by network reconstruction. , 2009, Genome research.

[2]  Xingming Zhao,et al.  Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks , 2014, Nucleic acids research.

[3]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[4]  H. Ohtsuki,et al.  Accumulation of driver and passenger mutations during tumor progression , 2009, Proceedings of the National Academy of Sciences.

[5]  Q. Cui,et al.  A Network of Cancer Genes with Co-Occurring and Anti-Co-Occurring Mutations , 2010, PloS one.

[6]  Jun Yokota,et al.  A gene‐alteration profile of human lung cancer cell lines , 2009, Human mutation.

[7]  Eli Upfal,et al.  De Novo Discovery of Mutated Driver Pathways in Cancer , 2011, RECOMB.

[8]  E. Allman,et al.  Phylogenetic invariants for the general Markov model of sequence mutation. , 2003, Mathematical biosciences.

[9]  C. Yeang,et al.  Combinatorial patterns of somatic gene mutations in cancer , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[10]  Kazuyuki Aihara,et al.  Identifying critical transitions of complex diseases based on a single sample , 2014, Bioinform..

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  M. Krawczak,et al.  A Markov chain description of the stepwise mutation model: local and global behaviour of the allele process. , 2010, Journal of theoretical biology.

[13]  Marcel J. T. Reinders,et al.  Identification of Networks of Co-Occurring, Tumor-Related DNA Copy Number Changes Using a Genome-Wide Scoring Approach , 2010, PLoS Comput. Biol..

[14]  M. Lynch Evolution of the mutation rate. , 2010, Trends in genetics : TIG.

[15]  Xing-Ming Zhao,et al.  Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information , 2012, Bioinform..

[16]  K. Aihara,et al.  Early Diagnosis of Complex Diseases by Molecular Biomarkers, Network Biomarkers, and Dynamical Network Biomarkers , 2014, Medicinal research reviews.

[17]  Peter Kuhn,et al.  Spreaders and sponges define metastasis in lung cancer: a Markov chain Monte Carlo mathematical model. , 2013, Cancer research.

[18]  John Quackenbush,et al.  Functional classification analysis of somatically mutated genes in human breast and colorectal cancers. , 2008, Genomics.

[19]  Richard Simon,et al.  Identifying cancer driver genes in tumor genome sequencing studies , 2011, Bioinform..

[20]  Xing-Ming Zhao,et al.  Inferring gene regulatory networks from gene expression data by PC-algorithm based on conditional mutual information , 2011 .

[21]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[22]  Richard Simon,et al.  Estimating the order of mutations during tumorigenesis from tumor genome sequencing data , 2012, Bioinform..

[23]  J. Klafter,et al.  First-passage times in complex scale-invariant media , 2007, Nature.

[24]  D. Goldstein,et al.  A low mutation rate for chloroplast microsatellites. , 1999, Genetics.

[25]  Marcel J. T. Reinders,et al.  Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion , 2013, BMC Bioinformatics.

[26]  Sayan Mukherjee,et al.  Modeling Cancer Progression via Pathway Dependencies , 2008, PLoS Comput. Biol..

[27]  Camille Stephan-Otto Attolini,et al.  A mathematical framework to determine the temporal sequence of somatic genetic events in cancer , 2010, Proceedings of the National Academy of Sciences.

[28]  Xing-Ming Zhao,et al.  NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference , 2013, Bioinform..

[29]  Kazuyuki Aihara,et al.  Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers , 2012, Scientific Reports.

[30]  Shi-Hua Zhang,et al.  Discovery of co-occurring driver pathways in cancer , 2014, BMC Bioinformatics.

[31]  Nicholas Eriksson,et al.  The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis , 2011, PloS one.

[32]  Shi-Hua Zhang,et al.  Efficient methods for identifying mutated driver pathways in cancer , 2012, Bioinform..

[33]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[34]  Michael J. McDonald,et al.  The Evolution of Low Mutation Rates in Experimental Mutator Populations of Saccharomyces cerevisiae , 2012, Current Biology.

[35]  G. Parmigiani,et al.  A multidimensional analysis of genes mutated in breast and colorectal cancers. , 2007, Genome research.

[36]  Yu Liu,et al.  Bioinformatics: The Impact of Accurate Quantification on Proteomic and Genetic Analysis and Research , 2014 .

[37]  On the distribution of interspecies correlation for Markov models of character evolution on Yule trees. , 2014, Journal of theoretical biology.