Predicting DNA Methylation Susceptibility Using CpG Flanking Sequences

DNA methylation is a type of chemical modification of DNA that adds a methyl group to DNA at the fifth carbon of the cytosine pyrimidine ring. In normal cells, methylation of CpG dinucleotides is extensively found across the genome. However, specific DNA regions known as the CpG islands, short CpG dinucleotide-rich stretches (500 bp - 2000bp), are commonly unmethylated. During tumorigenesis, on the other hand, global de-methylation and CpG island hypermethylation are widely observed. De novo hypermethylation at CpG dinucleotides is typically associated with loss of expression of flanking genes, thus it is believed to be an alternative to mutation and deletion in the inactivation of tumor suppressor genes. In this paper, we report that sequences flanking CpG sites can be used for predicting DNA methylation levels. DNA methylation levels were measured by utilizing a new high throughput sequencing technology (454) to sequence bisulfite treated DNA from four types of primary leukemia and lymphoma cells and normal peripheral blood lymphocytes. After measuring methylation levels at each CpG site, we used 30 bp flanking sequences to characterize methylation susceptibility in terms of character compositions and built predictive models for DNA methylation susceptibility, achieving up to 75% prediction accuracy in 10-fold cross validation tests. Our study is first of its kind to build predictive models for methylation susceptibility by utilizing CpG site specific methylation levels.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  J. Garland THE NEW ENGLAND JOURNAL OF MEDICINE , 1977, The Lancet.

[4]  D. Benbrook,et al.  Nature Reviews Cancer , 2003 .

[5]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[6]  R. Quatrano Genomics , 1998, Plant Cell.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  W. E. Gye,et al.  CANCER RESEARCH , 1923, British medical journal.

[9]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[10]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.