Identification and application of the concepts important for accurate and reliable protein secondary structure prediction

A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three‐state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as; residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto‐correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto‐correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of >80%. Existing high‐accuracy prediction methods are “black‐box” predictors based on complex nonlinear statistics (e.g., neural networks in P.HD: Rost & Sander, 1993a). For medium‐ to short‐length chains (≥90 residues and <170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three‐state accuracy of 72.4%, the highest accuracy reported for any prediction method.

[1]  K Biedermann,et al.  Low resolution structure of adenylate kinase. , 1973, Journal of molecular biology.

[2]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[3]  V. Lim Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. , 1974, Journal of molecular biology.

[4]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[5]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[6]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[7]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[8]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[9]  Georg E. Schulz,et al.  Principles of Protein Structure , 1979 .

[10]  R J Fletterick,et al.  Secondary structure assignment for alpha/beta proteins by a combinatorial approach. , 1983, Biochemistry.

[11]  J M Thornton,et al.  Amino and carboxy-terminal regions in globular proteins. , 1983, Journal of molecular biology.

[12]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[13]  Donald Michie Technology Lecture: The superarticulacy phenomenon in the context of software manufacture , 1986, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[14]  Donald Michie,et al.  The superarticulacy phenomenon in the context of software manufacture , 1990 .

[15]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[16]  R. Williams,et al.  Secondary structure predictions and medium range interactions. , 1987, Biochimica et biophysica acta.

[17]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[18]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[19]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[20]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[21]  S. Benner Patterns of divergence in homologous proteins as indicators of tertiary and quaternary structure. , 1989, Advances in enzyme regulation.

[22]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[23]  M J Sternberg,et al.  Machine learning approach for the prediction of protein secondary structure. , 1990, Journal of molecular biology.

[24]  Robert L. Baldwin,et al.  Relative helix-forming tendencies of nonpolar amino acids , 1990, Nature.

[25]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[26]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[27]  S. Benner,et al.  Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. , 1991, Advances in enzyme regulation.

[28]  Fred E. Cohen,et al.  β-Breakers: An aperiodic secondary structure , 1991 .

[29]  Stephen Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning , 1992 .

[30]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[31]  Michael J. E. Sternberg,et al.  Secondary structure prediction: Current Opinion in Structural Biology 1992, 2:237–241 , 1992 .

[32]  Mark A. Cohen,et al.  Correct structure prediction? , 1992, Nature.

[33]  A. Fersht,et al.  Alpha-helix stability in proteins. II. Factors that influence stability at an internal position. , 1992, Journal of molecular biology.

[34]  L Serrano,et al.  Alpha-helix stability in proteins. I. Empirical correlations concerning substitution of side-chains at the N and C-caps and the replacement of alanine by glycine or serine at solvent-exposed surfaces. , 1992, Journal of molecular biology.

[35]  A. Fersht,et al.  α-Helix stability in proteins , 1992 .

[36]  S H White,et al.  Amino acid preferences of small proteins. Implications for protein stability and evolution. , 1992, Journal of molecular biology.

[37]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[38]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[39]  S A Benner,et al.  Predicting the conformation of proteins man versus machine , 1993, FEBS letters.

[40]  B. Rost,et al.  Secondary structure prediction of all-helical proteins in two states. , 1993, Protein engineering.

[41]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[42]  G. Barton,et al.  The limits of protein secondary structure prediction accuracy from multiple sequence alignment. , 1993, Journal of molecular biology.

[43]  David L. Dowe,et al.  A decision graph explanation of protein secondary structure prediction , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[44]  B. Matthews,et al.  Structural basis of amino acid alpha helix propensity. , 1993, Science.

[45]  Shoshana J. Wodak,et al.  Generating and testing protein folds , 1993 .

[46]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[47]  S A Benner,et al.  Evaluating predictions of secondary structure in proteins. , 1994, Biochemical and biophysical research communications.

[48]  Victor V. Solovyev,et al.  Predicting alpha-helix and beta-strand segments of globular proteins , 1994, Comput. Appl. Biosci..

[49]  C Geourjon,et al.  SOPM: a self-optimized method for protein secondary structure prediction. , 1994, Protein engineering.

[50]  T L Blundell,et al.  Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures. , 1994, Journal of molecular biology.

[51]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[52]  Michael S. Waterman,et al.  RNA Secondary Structure , 1995 .

[53]  P. K. Mehta,et al.  A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70% , 1995, Protein science : a publication of the Protein Society.

[54]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[55]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[56]  W. DeGrado,et al.  Protein Design: A Hierarchic Approach , 1995, Science.

[57]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.