Overview of gene structure.

Throughout the C. elegans sequencing project Genefinder was the primary protein-coding gene prediction program. These initial predictions were manually reviewed by curators as part of a "first-pass annotation" and are actively curated by WormBase staff using a variety of data and information. In the WormBase data release WS133 there are 22,227 protein-coding gene, including 2,575 alternatively-spliced forms. Twenty-eight percent of these have every base of every exon confirmed by transcription evidence while an additional 51% have some bases confirmed. Most of the genes are relatively small covering a genomic region of about 3 kb. The average gene contains 6.4 coding exons accounting for about 26% of the genome. Most exons are small and separated by small introns. The median size of exons is 123 bases, while the most common size for introns is 47 bases. Protein-coding genes are denser on the autosomes than on chromosome X, and denser in the central region of the autosomes than on the arms. There are only 561 annotated pseudogenes but estimates but several estimates put this much higher.

[1]  Chaochun Wei,et al.  Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions. , 2005, Genome research.

[2]  U. Kück,et al.  Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation. , 2004, Fungal genetics and biology : FG & B.

[3]  M. Gerstein,et al.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. , 2001, Nucleic acids research.

[4]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[5]  A. Coulson,et al.  The rDNA of C. elegans: sequence and structure. , 1986, Nucleic acids research.

[6]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[7]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[8]  Mark Gerstein,et al.  Defining Genes in the Genomics Era , 2003, Science.

[9]  Anuphap Prachumwat,et al.  Intron Size Correlates Positively With Recombination Rate in Caenorhabditis elegans , 2004, Genetics.

[10]  Thomas Blumenthal,et al.  RNA Processing and Gene Structure , 1997 .

[11]  T. Blumenthal,et al.  An exon that prevents transport of a mature mRNA. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John Bracht,et al.  Trans-splicing and polyadenylation of let-7 microRNA primary transcripts. , 2004, RNA.

[13]  K. Lea,et al.  Cloning of Caenorhabditis U2AF65: an alternatively spliced RNA containing a novel exon , 1997, Molecular and cellular biology.

[14]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[15]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[16]  H. Robertson Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. , 1998, Genome research.

[17]  Cristian I. Castillo-Davis,et al.  Selection for short introns in highly expressed genes , 2002, Nature Genetics.

[18]  P. Wincker,et al.  High Coding Density on the Largest Paramecium tetraurelia Somatic Chromosome , 2004, Current Biology.

[19]  R. Durbin,et al.  The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics , 2003, PLoS biology.

[20]  Thomas Blumenthal,et al.  Operons as a common form of chromosomal organization in C. elegans , 1994, Nature.

[21]  J. Hudson,et al.  C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression , 2003, Nature Genetics.

[22]  S. Segawa,et al.  End of the beginning , 1990, Nature.

[23]  H. Robertson,et al.  The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. , 2000, Genome research.

[24]  Evidence suggesting that a fifth of annotated Caenorhabditis elegans genes may be pseudogenes. , 2002, Genome research.

[25]  A. Coulson,et al.  Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. , 1995, Genetics.

[26]  J. Spieth,et al.  Operons in C. elegans: Polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions , 1993, Cell.

[27]  A. Alfonso,et al.  Alternative splicing leads to two cholinergic proteins in Caenorhabditis elegans. , 1994, Journal of molecular biology.

[28]  E. Mauceli,et al.  The genome sequence of the filamentous fungus Neurospora crassa , 2003, Nature.

[29]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[30]  Jonathan E. Allen,et al.  The Genome of the Basidiomycetous Yeast and Human Pathogen Cryptococcus neoformans , 2005, Science.

[31]  H. Robertson,et al.  Updating the str and srj (stl) families of chemoreceptors in Caenorhabditis nematodes reveals frequent gene movement within and between chromosomes. , 2001, Chemical senses.

[32]  S. W. Emmons,et al.  A C. elegans mediator protein confers regulatory selectivity on lineage-specific expression of a transcription factor gene. , 2000, Genes & development.

[33]  A. Coulson,et al.  Genomic organization of major sperm protein genes and pseudogenes in the nematode Caenorhabditis elegans. , 1988, Journal of molecular biology.

[34]  Thomas Blumenthal,et al.  Both subunits of U2AF recognize the 3′ splice site in Caenorhabditis elegans , 1999, Nature.

[35]  Marc Vidal,et al.  WorfDB: the Caenorhabditis elegans ORFeome Database , 2003, Nucleic Acids Res..