Identifying distantly related protein sequences.

The most powerful method available today for inferring the biological function of a gene (or the protein that it encodes) from its sequence is similarity searching on protein and DNA sequence databases. With the development of rapid methods for sequence comparison, both with heuristic algorithms and powerful parallel computers, discoveries based solely on sequence homology have become routine. Indeed, the vast majority of the gene identifications in the recent descriptions of the Haemophilus influenzae (Fleischmann et ai, 1995), Mycoplasma genitalium (Fraser et ai, 1995), yeast (Dujon, 1996) and Methanococcus janesscii (Bult et ai, 1996) genomes are based only on protein sequence similarity. As more complete genomes become available, protein sequence comparison will become an even more powerful tool for understanding biological function.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[3]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[4]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[5]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[6]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[7]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[8]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[9]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[10]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[11]  R F Doolittle,et al.  Convergent evolution: the need to be explicit. , 1994, Trends in biochemical sciences.

[12]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[13]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[14]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.