CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome

DNA methylation is an inheritable chemical modification of cytosine, and represents one of the most important epigenetic events. Computational prediction of the DNA methylation status can be employed to speed up the genome-wide methylation profiling, and to identify the key features that are correlated with various methylation patterns. Here, we develop CpGIMethPred, the support vector machine-based models to predict the methylation status of the CpG islands in the human genome under normal conditions. The features for prediction include those that have been previously demonstrated effective (CpG island specific attributes, DNA sequence composition patterns, DNA structure patterns, distribution patterns of conserved transcription factor binding sites and conserved elements, and histone methylation status) as well as those that have not been extensively explored but are likely to contribute additional information from a biological point of view (nucleosome positioning propensities, gene functions, and histone acetylation status). Statistical tests are performed to identify the features that are significantly correlated with the methylation status of the CpG islands, and principal component analysis is then performed to decorrelate the selected features. Data from the Human Epigenome Project (HEP) are used to train, validate and test the predictive models. Specifically, the models are trained and validated by using the DNA methylation data obtained in the CD4 lymphocytes, and are then tested for generalizability using the DNA methylation data obtained in the other 11 normal tissues and cell types. Our experiments have shown that (1) an eight-dimensional feature space that is selected via the principal component analysis and that combines all categories of information is effective for predicting the CpG island methylation status, (2) by incorporating the information regarding the nucleosome positioning, gene functions, and histone acetylation, the models can achieve higher specificity and accuracy than the existing models while maintaining a comparable sensitivity measure, (3) the histone modification (methylation and acetylation) information contributes significantly to the prediction, without which the performance of the models deteriorate, and, (4) the predictive models generalize well to different tissues and cell types. The developed program CpGIMethPred is freely available at http://users.ece.gatech.edu/~hzheng7/CGIMetPred.zip.

[1]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[2]  Thomas Lengauer,et al.  CpG Island Mapping by Epigenome Prediction , 2007, PLoS Comput. Biol..

[3]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[4]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  Michael B. Stadler,et al.  Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome , 2007, Nature Genetics.

[7]  K D Robertson,et al.  DNA methylation: past, present and future directions. , 2000, Carcinogenesis.

[8]  Eva K. Lee,et al.  Predicting aberrant CpG island methylation , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michael Q. Zhang,et al.  Challenges in Understanding Genome-Wide DNA Methylation , 2010, Journal of Computer Science and Technology.

[10]  Thomas Lengauer,et al.  CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure , 2006, PLoS genetics.

[11]  A. Bird CpG-rich islands and the function of DNA methylation , 1986, Nature.

[12]  A. Feinberg,et al.  Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores , 2008, Nature Genetics.

[13]  Michael Q. Zhang,et al.  Histone methylation marks play important roles in predicting the methylation status of CpG islands. , 2008, Biochemical and biophysical research communications.

[14]  Yoshiyuki Sakaki,et al.  A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. , 2004, Genome research.

[15]  Shane C. Dillon,et al.  The landscape of histone modifications across 1% of the human genome in five human cell lines. , 2007, Genome research.

[16]  K. Zhao,et al.  Lsh, chromatin remodeling family member, modulates genome-wide cytosine methylation patterns at nonrepeat sequences , 2011, Proceedings of the National Academy of Sciences.

[17]  Shicai Fan,et al.  CpG island methylation pattern in different human tissues and its correlation with gene expression. , 2009, Biochemical and biophysical research communications.

[18]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[19]  Huseyin Seker,et al.  Detailed methylation prediction of CpG islands on human chromosome 21 , 2009 .

[20]  Michael Q. Zhang,et al.  Bioinformatics Original Paper Predicting Methylation Status of Cpg Islands in the Human Brain , 2022 .

[21]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[23]  M. D’Occhio,et al.  Different DNA methylation patterns detected by the Amplified Methylation Polymorphism Polymerase Chain Reaction (AMP PCR) technique among various cell types of bulls , 2010, Acta veterinaria Scandinavica.

[24]  Irene K. Moore,et al.  The DNA-encoded nucleosome organization of a eukaryotic genome , 2009, Nature.

[25]  Modesto Orozco,et al.  Determining promoter location based on DNA structure first-principles calculations , 2007, Genome Biology.

[26]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[27]  M. Frommer,et al.  CpG islands in vertebrate genomes. , 1987, Journal of molecular biology.

[28]  M. Pellegrini,et al.  Relationship between nucleosome positioning and DNA methylation , 2010, Nature.

[29]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[30]  E. Selker,et al.  Emerging connections between DNA methylation and histone acetylation , 2001, Cellular and Molecular Life Sciences CMLS.

[31]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[32]  G. Marsaglia,et al.  Evaluating Kolmogorov's distribution , 2003 .

[33]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[34]  N Turner,et al.  Chi-squared test. , 2000, Journal of clinical nursing.

[35]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[36]  Adrian Bird,et al.  Perceptions of epigenetics , 2007, Nature.

[37]  E. Bradbury,et al.  A mass spectrometric “Western blot” to evaluate the correlations between histone methylation and histone acetylation , 2004, Proteomics.

[38]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[39]  James A. Cuff,et al.  A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells , 2006, Cell.

[40]  Sophie Schbath,et al.  Exceptional Motifs in Different Markov Chain Models for a Statistical Analysis of DNA Sequences , 1995, J. Comput. Biol..

[41]  Yu-Dong Cai,et al.  Predicting DNA methylation status using word composition , 2010 .

[42]  Igor Zwir,et al.  Profile analysis and prediction of tissue-specific CpG island methylation classes , 2009, BMC Bioinformatics.