Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis.

The ability to accurately predict gene function based on gene sequence is an important tool in many areas of biological research. Such predictions have become particularly important in the genomics age in which numerous gene sequences are generated with little or no accompanying experimentally determined functional information. Almost all functional prediction methods rely on the identification, characterization, and quantification of sequence similarity between the gene of interest and genes for which functional information is available. Because sequence is the prime determining factor of function, sequence similarity is taken to imply similarity of function. There is no doubt that this assumption is valid in most cases. However, sequence similarity does not ensure identical functions, and it is common for groups of genes that are similar in sequence to have diverse (although usually related) functions. Therefore, the identification of sequence similarity is frequently not enough to assign a predicted function to an uncharacterized gene; one must have a method of choosing among similar genes with different functions. In such cases, most functional prediction methods assign likely functions by quantifying the levels of similarity among genes. I suggest that functional predictions can be greatly improved by focusing on how the genes became similar in sequence (i.e., evolution) rather than on the sequence similarity itself. It is well established that many aspects of comparative biology can benefit from evolutionary studies (Felsenstein 1985), and comparative molecular biology is no exception (e.g., Altschul et al. 1989; Goldman et al. 1996). In this commentary, I discuss the use of evolutionary information in the prediction of gene function. To appreciate the potential of a phylogenomic approach to the prediction of gene function, it is necessary to first discuss how gene sequence is commonly used to predict gene function and some general features about gene evolution.

[1]  Russell F. Doolittle,et al.  “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it , 1987, Cell.

[2]  L. Hood,et al.  Gene families: the taxonomy of protein paralogs and chimeras. , 1997, Science.

[3]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[4]  W. Atchley,et al.  A natural classification of the basic helix-loop-helix class of transcription factors. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Brian K. Hall,et al.  Homology: The hierarchical basis of comparative biology , 1994 .

[6]  J. V. Van Etten,et al.  A phylogenetic analysis of the mycoplasmas: basis for their classification , 1989, Journal of bacteriology.

[7]  R DeSalle,et al.  Alignment-ambiguous nucleotide sites and the exclusion of systematic data. , 1993, Molecular phylogenetics and evolution.

[8]  S. Yokoyama,et al.  Molecular genetic basis of adaptive selection: examples from color vision in vertebrates. , 1997, Annual review of genetics.

[9]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[10]  E. Abouheif,et al.  Evolution and orthology of hedgehog genes. , 1996, Trends in genetics : TIG.

[11]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[12]  David C. Jones,et al.  Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. , 1996, Journal of molecular biology.

[13]  Shmuel Pietrokovski,et al.  Superior performance in protein homology detection with the Blocks Database servers , 1998, Nucleic Acids Res..

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[15]  J A Eisen,et al.  Evolution of the SNF2 family of proteins: subfamilies with distinct sequences and functions. , 1995, Nucleic acids research.

[16]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[17]  R. Raff,et al.  Developmental genetics and traditional homology. , 1996, BioEssays : news and reviews in molecular, cellular and developmental biology.

[18]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[19]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[20]  Jonathan A. Eisen,et al.  Gastrogenomic delights: A movable feast , 1997, Nature Medicine.

[21]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[22]  David M. Hillis,et al.  10 – HOMOLOGY IN MOLECULAR BIOLOGY , 1994 .