An integrated approach for finding overlooked genes in yeast

We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to β-galactosidase (β-gal); non-annotated open reading frames (ORFs) translated as β-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.

[1]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiae revisited , 2000, FEBS letters.

[2]  C. Zhang,et al.  Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. , 2000, Nucleic acids research.

[3]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[4]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[5]  Michael R. Green,et al.  Dissecting the Regulatory Circuitry of a Eukaryotic Genome , 1998, Cell.

[6]  E. Chen,et al.  Shuttle mutagenesis: a method of transposon mutagenesis for Saccharomyces cerevisiae. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[8]  A. Sali,et al.  Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome , 2001, Nature Genetics.

[9]  A. C. Jiménez,et al.  The nucleotide sequence of Saccharomyces cerevisiae chromosome XIV and its evolutionary implications. , 1997, Nature.

[10]  P. Ross-Macdonald,et al.  A multipurpose transposon system for analyzing protein production, localization, and function in Saccharomyces cerevisiae. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Snyder,et al.  High-throughput methods for the large-scale analysis of gene function by transposon tagging. , 2000, Methods in enzymology.

[12]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[13]  G. Fink,et al.  Methods in yeast genetics , 1979 .

[14]  G. Church,et al.  RNA expression analysis using a 30 base pair resolution Escherichia coli genome array , 2000, Nature Biotechnology.

[15]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[16]  H. Mewes,et al.  Overview of the yeast genome. , 1997, Nature.

[17]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[18]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[19]  P. Kam,et al.  : 4 , 1898, You Can Cross the Massacre on Foot.

[20]  Kei-Hoi Cheung,et al.  Large-scale analysis of the yeast genome by transposon tagging and gene disruption , 1999, Nature.

[21]  R. D. Gietz,et al.  Overlapping transcription units in the dopa decarboxylase region of Drosophila , 1986, Nature.

[22]  K. Murata,et al.  Transformation of intact yeast cells treated with alkali cations. , 1984, Journal of bacteriology.

[23]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 19. Ascomycetes‐specific genes , 2000, FEBS letters.

[24]  R. Stoughton,et al.  Experimental annotation of the human genome using microarray technology , 2001, Nature.

[25]  R. Simons,et al.  Antisense RNA control in bacteria, phages, and plasmids. , 1994, Annual review of microbiology.

[26]  T. Moore,et al.  Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans , 2001, Nature Genetics.

[27]  T. Graves,et al.  Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. , 2001, Genome research.

[28]  K. Kinzler,et al.  NORF5/HUG1 Is a Component of theMEC1-Mediated Checkpoint Response to DNA Damage and Replication Arrest in Saccharomyces cerevisiae , 1999, Molecular and Cellular Biology.

[29]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[30]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[31]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[32]  S. Henikoff,et al.  Gene within a gene: Nested Drosophila genes encode unrelated proteins on opposite DNA strands , 1986, Cell.

[33]  C. Vaquero,et al.  Do natural antisense transcripts make sense in eukaryotes? , 1998, Gene.

[34]  J. Boeke,et al.  Small open reading frames: beautiful needles in the haystack. , 1997, Genome research.

[35]  B. Barrell,et al.  A Re-Annotation of the Saccharomyces Cerevisiae Genome , 2001, Comparative and functional genomics.

[36]  S. Cebrat,et al.  Origin and properties of non-coding ORFs in the yeast genome. , 1999, Nucleic acids research.

[37]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[38]  Kara Dolinski,et al.  Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data , 2001, Nucleic Acids Res..

[39]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.