A Re-Annotation of the Saccharomyces Cerevisiae Genome

Discrepancies in gene and orphan number indicated by previous analyses suggest that S. cerevisiae would benefit from a consistent re-annotation. In this analysis three new genes are identified and 46 alterations to gene coordinates are described. 370 ORFs are defined as totally spurious ORFs which should be disregarded. At least a further 193 genes could be described as very hypothetical, based on a number of criteria. It was found that disparate genes with sequence overlaps over ten amino acids (especially at the N-terminus) are rare in both S. cerevisiae and Sz. pombe. A new S. cerevisiae gene number estimate with an upper limit of 5804 is proposed, but after the removal of very hypothetical genes and pseudogenes this is reduced to 5570. Although this is likely to be closer to the true upper limit, it is still predicted to be an overestimate of gene number. A complete list of revised gene coordinates is available from the Sanger Centre (S. cerevisiae reannotation: ftp://ftp/pub/yeast/SCreannotation).

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[2]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[3]  M. Boguski,et al.  Functional genomics: it's all how you read it. , 1997, Science.

[4]  Amos Bairoch,et al.  A Generalized Profile Syntax for Biomolecular Sequence Motifs and its Function in Automatic Sequence Interpretation , 1994, ISMB.

[5]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[6]  H. Mewes,et al.  Overview of the yeast genome. , 1997, Nature.

[7]  B. Barrell,et al.  Analysis of 114 kb of DNA sequence from fission yeast chromosome 2 immediately centromere‐distal to his5 , 2000, Yeast.

[8]  S. Oliver From DNA sequence to biological function , 1996, Nature.

[9]  C. Zhang,et al.  Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. , 2000, Nucleic acids research.

[10]  M. Berbee,et al.  Dating the evolutionary radiations of the true fungi , 1993 .

[11]  T J Gibson,et al.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. , 1996, Nucleic acids research.

[12]  S. Cebrat,et al.  Origin and properties of non-coding ORFs in the yeast genome. , 1999, Nucleic acids research.

[13]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[14]  Erik L. L. Sonnhammer,et al.  A workbench for large-scale sequence homology analysis , 1994, Comput. Appl. Biosci..

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[17]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiae revisited , 2000, FEBS letters.

[18]  B. Dujon,et al.  The complete DNA sequence of yeast chromosome III , 1992, Nature.

[19]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[20]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 19. Ascomycetes‐specific genes , 2000, FEBS letters.

[22]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 21. Comparative functional classification of genes , 2000, FEBS letters.

[23]  P. Sharp,et al.  Synonymous codon usage in Saccharomyces cerevisiae , 1991, Yeast.

[24]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[25]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..