Predicting DNA mutations during cancer evolution

Bio-systems are inherently complex information processing systems. Their physiological complexities limit the formulation and testing of a hypothesis for their behaviour. Our goal here was to test a computational framework utilising published data from a longitudinal study of patients with acute myeloid leukaemia (AML), whose DNA from both normal and malignant tissues were subjected to NGS analysis at various points in time. By processing the sequencing data before relapse time, we tested our framework by predicting the regions of the genome to be mutated at relapse time and, later, by comparing our results with the actual regions that showed mutations (discovered by genome sequencing at relapse time). After a detailed statistical analysis, the resulting correlation coefficient (degree of matching of proposed framework with real data) is 0.9816 ± 0.009 at 95% confidence interval. This high performance from our proposed framework opens new research opportunities for bioinformatics researchers and clinical doctors.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  Eugenio Santos,et al.  A point mutation is responsible for the acquisition of transforming properties by the T24 human bladder carcinoma oncogene , 1982, Nature.

[3]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[4]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[5]  K. Oosawa,et al.  Color-coding reveals tandem repeats in the Escherichia coli genome. , 2000, Journal of molecular biology.

[6]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[7]  S. S. Iyengar,et al.  Statistical techniques in modeling of complex systems: Single and multiresponse models , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Gajendra P. S. Raghava,et al.  Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation , 2004, Bioinform..

[9]  Uwe Aickelin,et al.  Wavelet Feature Extraction and Genetic Algorithm for Biomarker Detection in Colorectal Cancer Data , 2013, Knowl. Based Syst..

[10]  Akito Taneda Adplot: detection and visualization of repetitive patterns in complete genomes , 2004, Bioinform..

[11]  H. Lehrach,et al.  Somatic Mutation Profiles of MSI and MSS Colorectal Cancer Identified by Whole Exome Next Generation Sequencing and Bioinformatics Analysis , 2010, PloS one.

[12]  Guang R. Gao,et al.  TROLL-Tandem Repeat Occurrence Locator , 2002, Bioinform..

[13]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[14]  Ahmad M. Sarhan,et al.  Wavelet-based feature extraction for DNA microarray classification , 2013, Artificial Intelligence Review.

[15]  M. Sifuzzaman,et al.  Application of Wavelet Transform and its Advantages Compared to Fourier Transform , 2009 .

[16]  Kenneth A. Marx,et al.  Poly: a quantitative analysis tool for simple sequence repeat (SSR) tracts in DNA , 2003, BMC Bioinformatics.

[17]  B. Haas,et al.  A clustering method for repeat analysis in DNA sequences , 2001, Genome Biology.

[18]  Enno Ohlebusch,et al.  Optimal Exact Strring Matching Based on Suffix Arrays , 2002, SPIRE.

[19]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[20]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[21]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[22]  E. Lander,et al.  Lessons from the Cancer Genome , 2013, Cell.

[23]  A. van Belkum,et al.  Occurrence and structure-function relationship of pentameric short sequence repeats in microbial genomes. , 1999, Research in microbiology.

[24]  J. Martínez Towards the prediction of mutations in genomic sequences , 2013 .

[25]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[26]  Chi-Ren Shyu,et al.  Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals , 2005, BMC Bioinformatics.

[27]  S. Bachellier,et al.  Short palindromic repetitive DNA elements in enterobacteria: a survey. , 1999, Research in microbiology.