Gene-Ontology analysis reveals association of tissue-specific 5' CpG-island genes with development and embryogenesis.

A key open question in the understanding of the biology of DNA methylation relates to the origin and function of CpG islands, stretches of GC-rich and relatively CpG-rich DNA sequence that often colocalize with promoter regions. All housekeeping, but also a substantial minority of tissue-specific genes are associated with the CpG islands. Limited experimental evidence suggests that CpG islands are associated with promoters or replication origins active during early development. Although this hypothesis is attractive for widely expressed genes, which would be expected to be expressed during early development, many tissue-specific genes also contain CpG islands. In this work, we used a genome-wide Gene-Ontology (GO)-based approach to analyze associations between GO terms and the presence of 5' CpG islands in human Reference Sequence (RefSeq) genes. We found that 19 of the 3849 GO terms with at least one annotated human sequence showed a highly significant association with the likelihood of 5' CpG islands being present in genes annotated to that term. In particular, the term 'development' showed a highly significantly increased proportion of 5' CpG island genes. The overrepresentation of 5' CpG island genes was even more significant for tissue-specific RefSeqs annotated to development as well as many of its descendent terms. In addition, the proportion of expressed sequence tags from embryonic libraries amongst tissue-specific genes was twice as high for RefSeqs with 5' CpG islands as for those without CpG islands. These results provide strong support for previous speculations that early embryonic expression is associated with CpG islands.

[1]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[2]  A. Bird,et al.  Number of CpG islands and genes in human and mouse. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Francisco Antequera,et al.  Initiation of DNA replication at CpG islands in mammalian chromosomes , 1998, The EMBO journal.

[4]  A. Bird,et al.  The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Peter A. Jones,et al.  Cancer-epigenetics comes of age , 1999, Nature Genetics.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Jan Komorowski,et al.  Learning Rule-based Models of Biological Process from Gene Expression Time Profiles Using Gene Ontology , 2003, Bioinform..

[8]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[9]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[10]  R. Jaenisch,et al.  RNA and the Epigenetic Regulation of X Chromosome Inactivation , 1998, Cell.

[11]  H. Prydz,et al.  CpG islands as gene markers in the human genome. , 1992, Genomics.

[12]  C. Walsh,et al.  Cytosine methylation and the ecology of intragenomic parasites. , 1997, Trends in genetics : TIG.

[13]  Alexander E Vinogradov,et al.  Isochores and tissue-specificity. , 2003, Nucleic acids research.

[14]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[15]  M Vingron,et al.  GeneNest: automated generation and visualization of gene indices. , 2000, Trends in genetics : TIG.

[16]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence Project: update and current status , 2003, Nucleic Acids Res..

[17]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[18]  M. Monk,et al.  Transcription of tissue-specific genes in human preimplantation embryos. , 1997, Human reproduction.

[19]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[20]  A. Bird,et al.  An Alternative Promoter in the Mouse Major Histocompatibility Complex Class II I-Aβ Gene: Implications for the Origin of CpG Islands , 1998, Molecular and Cellular Biology.

[21]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[22]  M. E. May,et al.  ExQuest, a novel method for displaying quantitative gene expression from ESTs. , 2004, Genomics.

[23]  D. Pravtcheva,et al.  The undermethylated state of a CpG island region in igf2 transgenes is dependent on the H19 enhancers. , 1999, Genomics.

[24]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[25]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[26]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[27]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[28]  Frederick P. Roth,et al.  Predicting phenotype from patterns of annotation , 2003, ISMB.

[29]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[30]  M. Frommer,et al.  Transcripts and CpG islands associated with the pro-opiomelanocortin gene and other neurally expressed genes. , 1994, Journal of molecular endocrinology.

[31]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[32]  S. Bortoluzzi,et al.  Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests. , 2001, Human molecular genetics.

[33]  V. McKusick The anatomy of the human genome: a neo-Vesalian basis for medicine in the 21st century. , 2001, JAMA.

[34]  Kuo-Chen Chou,et al.  A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. , 2003, Biochemical and biophysical research communications.

[35]  Howard Cedar,et al.  DNA methylation represses transcription in vivo , 1999, Nature Genetics.

[36]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[37]  F. Antequera,et al.  Structure, function and evolution of CpG island promoters , 2003, Cellular and Molecular Life Sciences CMLS.

[38]  L. Duret,et al.  Determinants of CpG islands: expression in early embryo and isochore structure. , 2001, Genome research.