An evaluation of new criteria for CpG islands in the human genome as gene markers

MOTIVATION Recently, more stringent criteria for CpG islands have been introduced to exclude Alu repeats, thereby enabling a higher proportion of CpG islands associating with genes to be identified. Using these new criteria, several types of associations between CpG islands and genes were investigated to further establish the importance of CpG islands as gene markers. RESULTS The CpG islands were searched by CpGIE, a java software program developed for CpG island identification. CpGIE was advanced in identification accuracy compared with other tools. According to our results, about 70% of the identified CpG islands were associating with the human genes and over half of them are in the promoters. Furthermore, the investigation of genes in the confirmed gene model showed that 56% of them had a CpG island overlapping the transcription start sites. In comparison, the new criteria were found capable of filtering a large fraction of Alu repeats that was identified as CpG islands by the generally accepted criteria within the genes, but very few CpG islands associating with the promoters were affected. The genes in the predicted gene model were not obviously associated with CpG islands, suggesting that CpG islands can be used to evaluate the accuracy of gene annotation. AVAILABILITY http://bioinfo.hku.hk/cpgieintro

[1]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[2]  Francisco Antequera,et al.  CpG islands as genomic footprints of promoters that are associated with replication origins , 1999, Current Biology.

[3]  A. Bird,et al.  Number of CpG islands and genes in human and mouse. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[4]  L. Duret,et al.  Determinants of CpG islands: expression in early embryo and isochore structure. , 2001, Genome research.

[5]  J. Yon,et al.  Conservation of the organization of five tightly clustered genes over 600 million years of divergent evolution. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Frommer,et al.  CpG islands in vertebrate genomes. , 1987, Journal of molecular biology.

[7]  Sridhar Hannenhalli,et al.  Promoter prediction in the human genome , 2001, ISMB.

[8]  Michael Q. Zhang,et al.  Large-scale human promoter mapping using CpG islands , 2000, Nature Genetics.

[9]  Y. Edwards,et al.  CpG islands in genes showing tissue-specific expression. , 1990, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[10]  M. Fried,et al.  The mouse surfeit locus contains a cluster of six genes associated with four CpG-rich islands in 32 kilobases of genomic DNA , 1990, Molecular and cellular biology.

[11]  Daiya Takai,et al.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[13]  Daiya Takai,et al.  The CpG Island Searcher: A new WWW resource , 2003, Silico Biol..

[14]  H. Prydz,et al.  CpG islands as gene markers in the human genome. , 1992, Genomics.

[15]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[16]  Dominique Mouchiroud,et al.  CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences , 2002, Bioinform..

[17]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.