Bioinformatics and the discovery of gene function.

Scientific history was made in completing the yeast genuine sequence, yet its 13 Mb are a mere starting point. Two challenges loom large: to decipher the function of all genes and to describe the workings of the eukary-otic cell in full molecular detail. A combination of experimental and theoretical approaches will be brought to bear on these challenges. What will be next in yeast genome analysis from the point of view of bioinformatics? Current information status Functional knowledge about yeast genes is already more advanced than one might have expected at the outset of the sequencing effort. For an amazing 65% of the approximately 6000 protein-encoding genes, we already have some functional information. For about 30°/o of the total, occording to B. Dujon (this issue), functional knowledge is the result of direct experiment, but for a large fraction , about 350,6 of the total, functional information was derived by hom-ology transfer. Homology transfer of information exploits the evolutionary continuity in protein function and structure over very long time spans, apparent in our wodd as the presence of similar genes in viruses, bacteria and eukaryotes. Technically, hom-ology transfer becomes possible because of the two pillars nf current bio-informatics: databases that capture the experimental knowledge about gene function in different organisms and search algorithms that permit the detailed comparison of a new gene with all available database sequences. Whenever sequence similarity is detected at a le'.'el that clearly indicates functional er stluctuml homology, information can be transferred from a gene of known function in one species to one cf unknown function ~n another species (or in the same species'). In some cases, the transferred information exquisitely describes die detailed biochemical and/or cellular function of a new gene, for example, that of the yeast open reading frame (ORF) YCR14c on chromosome III (SWISS-PROT Accession No. P25615) as the functional cousin of the mammalian DNA polymerase 13 (Ref. 1). In other l Sequence Probable orphans, spudnus function ORFS un} Homol .o~, but lunetton unlcdlowrl $. cases, the power of prediction is very limited, because of strong functional divergence or because the homology is limited to a sequence fragment; an example is the prediction of nucleic acid binding properties based on the presence of a zinc finger motif. In many cases, the prediction of gene function by homology transfer can be easily achieved using standard database search tools. However, trained experts are needed to achieve …