Bioinformatics Original Paper Predicting Methylation Status of Cpg Islands in the Human Brain

MOTIVATION Over 50% of human genes contain CpG islands in their 5'-regions. Methylation patterns of CpG islands are involved in tissue-specific gene expression and regulation. Mis-epigenetic silencing associated with aberrant CpG island methylation is one mechanism leading to the loss of tumor suppressor functions in cancer cells. Large-scale experimental detection of DNA methylation is still both labor-intensive and time-consuming. Therefore, it is necessary to develop in silico approaches for predicting methylation status of CpG islands. RESULTS Based on a recent genome-scale dataset of DNA methylation in human brain tissues, we developed a classifier called MethCGI for predicting methylation status of CpG islands using a support vector machine (SVM). Nucleotide sequence contents as well as transcription factor binding sites (TFBSs) are used as features for the classification. The method achieves specificity of 84.65% and sensitivity of 84.32% on the brain data, and can also correctly predict about two-third of the data from other tissues reported in the MethDB database. AVAILABILITY An online predictor based on MethCGI is available at http://166.111.201.7/MethCGI.html CONTACT mzhang@cshl.edu SUPPLEMENTARY INFORMATION Supplementary data available at Bioinformatics online and http://166.111.201.7/help.html.

[1]  A. Bird,et al.  Use of restriction enzymes to study eukaryotic DNA methylation: II. The symmetry of methylated sites supports semi-conservative copying of the methylation pattern. , 1978, Journal of molecular biology.

[2]  Christopher J Kane,et al.  Promoter CpG hypomethylation and transcription factor EGR1 hyperactivate heparanase expression in bladder cancer , 2005, Oncogene.

[3]  M. Frommer,et al.  CpG islands in vertebrate genomes. , 1987, Journal of molecular biology.

[4]  Lisa A. McPherson,et al.  AP2alpha and AP2gamma: a comparison of binding site specificity and trans-activation of the estrogen receptor promoter and single site promoter constructs. , 1999, Nucleic acids research.

[5]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Stephen B. Baylin,et al.  Mapping Patterns of CpG Island Methylation in Normal and Neoplastic Cells Implicates Both Upstream and Downstream Regions inde Novo Methylation* , 1997, The Journal of Biological Chemistry.

[7]  C. Walsh,et al.  Cytosine methylation and mammalian development. , 1999, Genes & development.

[8]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[9]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[10]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[11]  Martha R. Stampfer,et al.  Chromatin Inactivation Precedes De Novo DNA Methylation during the Progressive Epigenetic Silencing of the RASSF1A Promoter , 2005, Molecular and Cellular Biology.

[12]  Satoshi Tanaka,et al.  Epigenetic marks by DNA methylation specific to stem, germ and somatic cells in mice , 2002, Genes to cells : devoted to molecular & cellular mechanisms.

[13]  Antony V. Cox,et al.  Open access, freely available online PLoS BIOLOGY DNA Methylation Profiling of the Human Major Histocompatibility Complex: A Pilot Study , 2022 .

[14]  A. Bird,et al.  Use of restriction enzymes to study eukaryotic DNA methylation: I. The methylation pattern in ribosomal DNA from Xenopus laevis. , 1978, Journal of molecular biology.

[15]  Michael Q. Zhang,et al.  Similarity of position frequency matrices for transcription factor binding sites , 2005, Bioinform..

[16]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[17]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[18]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[19]  A. Bird CpG-rich islands and the function of DNA methylation , 1986, Nature.

[20]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[21]  A. Bird,et al.  The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Raymond Sawaya,et al.  FoxM1B is overexpressed in human glioblastomas and critically regulates the tumorigenicity of glioma cells. , 2006, Cancer research.

[23]  R. Weigel,et al.  Identification of ERF-1 as a member of the AP2 transcription factor family. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  A. Bird,et al.  Number of CpG islands and genes in human and mouse. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Thomas Lengauer,et al.  CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure , 2006, PLoS genetics.

[27]  J. Baraban,et al.  The EGR family of transcription-regulatory factors: progress at the interface of molecular and systems neuroscience , 1999, Trends in Neurosciences.

[28]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[29]  A. Razin,et al.  Methylation of CpG sequences in eukaryotic DNA , 1981, FEBS letters.

[30]  Hiroki Nagase,et al.  Association of tissue-specific differentially methylated regions (TDMs) with differential gene expression. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Rajvir Dahiya,et al.  Promoter CpG hypomethylation and transcription factor EGR1 hyperactivate heparanase expression in bladder cancer , 2005, Oncogene.

[32]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[33]  Michael Q. Zhang,et al.  Large-scale structure of genomic methylation patterns. , 2005, Genome research.

[34]  R. Buettner,et al.  Regulatory roles of AP-2 transcription factors in vertebrate development, apoptosis and cell-cycle control. , 2000, Gene.

[35]  Éric Renault,et al.  MethDB - a public database for DNA methylation data , 2001, Nucleic Acids Res..

[36]  Anne Bergmann,et al.  Methylation-sensitive binding of transcription factor YY1 to an insulator sequence within the paternally expressed imprinted gene, Peg3. , 2003, Human molecular genetics.

[37]  Peter A. Jones,et al.  The fundamental role of epigenetic events in cancer , 2002, Nature Reviews Genetics.

[38]  A. Rosenthal,et al.  Large-scale methylation analysis of human genomic DNA reveals tissue-specific differences between the methylation profiles of genes and pseudogenes. , 2000, Human molecular genetics.

[39]  A. Bird,et al.  Methylation-Induced Repression— Belts, Braces, and Chromatin , 1999, Cell.

[40]  Eva K. Lee,et al.  Predicting aberrant CpG island methylation , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  M. Batzer,et al.  Alu repeats and human genomic diversity , 2002, Nature Reviews Genetics.

[42]  J. Squire,et al.  Identification of a novel zinc finger gene, zf5‐3, as a potential mediator of neuroblastoma differentiation , 1999, International journal of cancer.