A novel K-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters

DNA methylation is essential for normal cell development and differentiation and plays a crucial role in the development of nearly all types of cancer. Although it is now possible, using next generation sequencing technologies, to assess human methylomes at base resolution, no reports currently exist on modeling cell type-specific DNA methylation susceptibility. Thus, we conducted a comprehensive modeling study of cell type-specific DNA methylation susceptibility at three different resolutions: CpG dinucleotides, CpG segments, and individual gene promoter regions. Using a k-mer mixture logistic regression model, we effectively modeled DNA methylation susceptibility across five different cell types. The significance of these results is three fold: 1) this is the first report to indicate that CpG methylation susceptible "segments" exist; 2) our model demonstrates the significance of certain k-mers for the mixture model, potentially highlighting DNA sequence features (k-mers) of differentially methylated, promoter CpG island sequences across different tissue types; 3) as only 3 or 4 bp patterns had previously been used for modeling DNA methylation susceptibility, ours is the first demonstration that 6-mer modeling can be performed without loss of accuracy.

[1]  Dong Xu,et al.  Predicting DNA Methylation Susceptibility Using CpG Flanking Sequences , 2007, Pacific Symposium on Biocomputing.

[2]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[3]  Kelly M. McGarvey,et al.  The cancer epigenome--components and functional correlates. , 2006, Genes & development.

[4]  T. Furey,et al.  Genomic sweeping for hypermethylated genes , 2007, Bioinform..

[5]  Janet Kelso,et al.  PatMaN: rapid alignment of short sequences to large databases , 2008, Bioinform..

[6]  Timothy E. Reddy,et al.  Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. , 2009, Genome research.

[7]  J. Herman,et al.  Gene silencing in cancer in association with promoter hypermethylation. , 2003, The New England journal of medicine.

[8]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[9]  Peter A. Jones,et al.  Cancer-epigenetics comes of age , 1999, Nature Genetics.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  R. Reinhardt,et al.  DNA Methylation Analysis of Chromosome 21 Gene Promoters at Single Base Pair and Single Allele Resolution , 2009, PLoS genetics.

[12]  Albert Jeltsch,et al.  Profound flanking sequence preference of Dnmt3a and Dnmt3b mammalian DNA methyltransferases shape the human epigenome. , 2005, Journal of molecular biology.

[13]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[14]  I. Simon,et al.  Evidence for an instructive mechanism of de novo methylation in cancer cells , 2006, Nature Genetics.

[15]  M. T. McCabe,et al.  A multifactorial signature of DNA sequence and polycomb binding predicts aberrant CpG island methylation. , 2009, Cancer research.

[16]  Eva K. Lee,et al.  Predicting aberrant CpG island methylation , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[18]  Igor Zwir,et al.  Profile analysis and prediction of tissue-specific CpG island methylation classes , 2009, BMC Bioinformatics.

[19]  Michael Q. Zhang,et al.  Bioinformatics Original Paper Predicting Methylation Status of Cpg Islands in the Human Brain , 2022 .

[20]  C. Plass,et al.  DNA motifs associated with aberrant CpG island methylation. , 2006, Genomics.

[21]  Dong Xu,et al.  Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. , 2007, Cancer research.

[22]  M. Caligiuri,et al.  Aberrant CpG-island methylation has non-random and tumour-type–specific patterns , 2000, Nature Genetics.

[23]  Thomas Lengauer,et al.  CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure , 2006, PLoS genetics.