The art of matchmaking: sequence alignment methods and their structural implications.

This work was supported by grants from the National Library of Medicine (LM05205), the National Science Foundation (DBI-9807993), and the Department of Energy (DE-FG02-98ER62558). I thank Tom Plasterer for his help with the figures.

[1]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[2]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[3]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.

[4]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[5]  David Eisenberg,et al.  Inverted protein structure prediction , 1993 .

[6]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[7]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  M. E. Welch,et al.  Bayesian analysis of time series and dynamic models , 1990 .

[10]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  J. Kendrew,et al.  A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis , 1958, Nature.

[13]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[14]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Greer Comparative modeling methods: Application to the family of the mammalian serine proteases , 1990, Proteins.

[16]  S. Oliver,et al.  Erratum: Overview of the yeast genome , 1997, Nature.

[17]  Richard H. Lathrop,et al.  Current Limitations to Protein Threading Approaches , 1997, J. Comput. Biol..

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[19]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[20]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[21]  W. Taylor,et al.  Multiple sequence threading: an analysis of alignment quality and stability. , 1997, Journal of molecular biology.

[22]  J. Collado-Vides Integrative Approaches to Molecular Biology , 1996 .

[23]  Jérôme Gracy,et al.  Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities , 1998, Bioinform..

[24]  Temple F. Smith,et al.  Multiple domain protein diagnostic patterns , 1996, Protein science : a publication of the Protein Society.

[25]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[26]  M G Rossmann,et al.  Comparison of protein structures. , 1985, Methods in enzymology.

[27]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[29]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[30]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[31]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[32]  B. Furie,et al.  PART II. COMPUTER‐ASSISTED MACROMOLECULAR STRUCTURE GENERATION: EXTENSION OF EXISTING INFORMATION: On the Construction of Computer Models of Proteins by the Extension of Crystallographic Structures , 1985, Annals of the New York Academy of Sciences.

[33]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[34]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[35]  Jun Zhu,et al.  Bayesian adaptive sequence alignment algorithms , 1998, Bioinform..

[36]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[37]  Temple F. Smith,et al.  A homology identification method that combines protein sequence and structure information , 1998, Protein science : a publication of the Protein Society.