A protein taxonomy based on secondary structure

Does a protein's secondary structure determine its three-dimensional fold? This question is tested directly by analyzing proteins of known structure and constructing a taxonomy based solely on secondary structure. The taxonomy is generated automatically, and it takes the form of a tree in which proteins with similar secondary structure occupy neighboring leaves. Our tree is largely in agreement with results from the structural classification of proteins (SCOP), a multidimensional classification based on homologous sequences, full three-dimensional structure, information about chemistry and evolution, and human judgment. Our findings suggest a simple mechanism of protein evolution.

[1]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[2]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[3]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[4]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[5]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[6]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[7]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[8]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[9]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[10]  A. Fersht,et al.  The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation-condensation mechanism for protein folding. , 1995, Journal of molecular biology.

[11]  G. Rose,et al.  Protein folding--what's the question? , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[13]  J. Rizo,et al.  Cavity formation before stable hydrogen bonding in the folding of a β-clam protein , 1997, Nature Structural Biology.

[14]  H. Dyson,et al.  Folding propensities of peptide fragments of myoglobin , 1997, Protein science : a publication of the Protein Society.

[15]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[16]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[17]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[18]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[19]  Hamilton O. Smith,et al.  Finding sequence motifs in groups of functionally related proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[20]  P E Wright,et al.  Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding. II. Plastocyanin. , 1992, Journal of molecular biology.

[21]  R. Srinivasan,et al.  Local Interactions in Protein Folding: Lessons from the α-Helix* , 1997, The Journal of Biological Chemistry.

[22]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.

[23]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[24]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[25]  Irwin D. Kuntz,et al.  Effects of distance constraints on macromolecular conformation. II. Simulation of experimental results and theoretical predictions , 1979 .

[26]  G. Rose,et al.  Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[29]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[30]  G. Rose,et al.  Is protein folding hierarchic? I. Local structure and peptide folding. , 1999, Trends in biochemical sciences.

[31]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[32]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[33]  R. Doolittle,et al.  Of urfs and orfs , 1986 .

[34]  P E Wright,et al.  Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding. I. Myohemerythrin. , 1992, Journal of molecular biology.

[35]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[36]  S. Henikoff,et al.  Embedding strategies for effective use of information from multiple sequence alignments , 1997, Protein science : a publication of the Protein Society.

[37]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[38]  Timothy F. Havel,et al.  Does compactness induce secondary structure in proteins? A study of poly-alanine chains computed by distance geometry. , 1994, Journal of molecular biology.

[39]  P. S. Kim,et al.  Context-dependent secondary structure formation of a designed protein sequence , 1996, Nature.

[40]  Single-tryptophan mutants of monomeric tryptophan repressor: optical spectroscopy reveals nonnative structure in a model for an early folding intermediate. , 1998, Biochemistry.

[41]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[42]  J. King Genetic Analysis of Protein Folding Pathways , 1986, Bio/Technology.

[43]  J. Garnier,et al.  Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. , 1997, Journal of molecular biology.

[44]  D. Lipman,et al.  Extracting protein alignment models from the sequence database. , 1997, Nucleic acids research.