EAnnot: a genome annotation tool using experimental evidence.

The sequence of any genome becomes most useful for biological experimentation when a complete and accurate gene set is available. Gene prediction programs offer an efficient way to generate an automated gene set. Manual annotation, when performed by experienced annotators, is more accurate and complete than automated annotation. However, it is a laborious and expensive process, and by its nature, introduces a degree of variability not found with automated annotation. EAnnot (Electronic Annotation) is a program originally developed for manually annotating the human genome. It combines the latest bioinformatics tools to extract and analyze a wide range of publicly available data in order to achieve fast and reliable automatic gene prediction and annotation. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals. Here, we compare manual annotation of human chromosome 6 with annotation performed by EAnnot in order to assess the latter's accuracy. EAnnot can readily be applied to manual annotation of other eukaryotic genomes and can be used to rapidly obtain an automated gene set.

[1]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[2]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[3]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[4]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[5]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[6]  I. Dunham,et al.  DNA sequence and analysis of human chromosome 9 , 2003, Nature.

[7]  D. Ferrier,et al.  Evolution of the Hox/ParaHox gene clusters. , 2003, The International journal of developmental biology.

[8]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[9]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[10]  Wei Zhu,et al.  Optimal spliced alignment of homologous cDNA to a genomic DNA template , 2000, Bioinform..

[11]  David States,et al.  Selecting for functional alternative splices in ESTs. , 2002, Genome research.

[12]  M. Shaw,et al.  Identification of three novel SEDL mutations, including mutation in the rare, non‐canonical splice site of exon 4 , 2003, Clinical genetics.

[13]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[14]  N. Nomura,et al.  Complete sequencing and characterization of 21,243 full-length human cDNAs , 2004, Nature Genetics.

[15]  P Bork,et al.  Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution. , 2001, Genome research.

[16]  Terry Gaasterland,et al.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. , 2003, Genome research.

[17]  Steven L Salzberg,et al.  Computational discovery of internal micro-exons. , 2003, Genome research.

[18]  U. Monani,et al.  Structure of the human alpha 2 subunit gene of the glycine receptor--use of vectorette and Alu-exon PCR. , 1996, Genome Research.

[19]  S Walsh,et al.  ACEDB: a database for genome information. , 1998, Methods of biochemical analysis.

[20]  Stefan Wiemann,et al.  LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system , 2004, Nucleic Acids Res..

[21]  B. Blumberg,et al.  Overlapping gene structure of human VLCAD and DLG4. , 2003, Gene.

[22]  Richard Mott,et al.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA , 1997, Comput. Appl. Biosci..

[23]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[24]  Michal Galdzicki,et al.  Mammalian overlapping genes: the comparative perspective. , 2004, Genome research.

[25]  Alistair G. Rust,et al.  Ensembl 2002: accommodating comparative genomics , 2003, Nucleic Acids Res..

[26]  I. Dunham,et al.  The DNA sequence and analysis of human chromosome 6 , 2003, Nature.

[27]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[28]  B. Torbett,et al.  Alternative Splicing of the Human Cyclin D-binding Myb-like Protein (hDMP1) Yields a Truncated Protein Isoform That Alters Macrophage Differentiation Patterns* , 2003, Journal of Biological Chemistry.

[29]  W. Frankel,et al.  A major effect QTL determined by multiple genes in epileptic EL mice. , 2000, Genome research.

[30]  Rob Buurman Overlapping , 1892, The Hospital.