A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space

The development of prediction methods based on statistical theory generally consists of two parts: one is focused on the exploration of new algorithms, and the other on the improvement of a training database. The current study is devoted to improving the prediction of protein structural classes from both of the two aspects. To explore a new algorithm, a method has been developed that makes allowance for taking into account the coupling effect among different amino acid components of a protein by a covariance matrix. To improve the training database, the selection of proteins is carried out so that they have (1) as many non‐homologous structures as possible, and (2) a good quality of structure. Thus, 129 representative proteins are selected. They are classified into 30 α, 30 β, 30 α + β, 30 α/β, and 9 ζ (irregular) proteins according to a new criterion that better reflects the feature of the structural classes concerned. The average accuracy of prediction by the current method for the 4 × 30 regular proteins is 99.2%, and that for 64 independent testing proteins not included in the training database is 95.3%. To further validate its efficiency, a jackknife analysis has been performed for the current method as well as the previous ones, and the results are also much in favor of the current method. To complete the mathematical basis, a theorem is presented and proved in Appendix A that is instructive for understanding the novel method at a deeper level. © 1995 Wiley‐Liss, Inc.

[1]  Kuo-Chen Chou,et al.  Energetic approach to the folding of α/β barrels , 1991 .

[2]  K. Chou,et al.  Simulated annealing approach to the study of protein structures. , 1991, Protein engineering.

[3]  Kuo-Chen Chou,et al.  Energetics of interactions of regular structural elements in proteins , 1990 .

[4]  T. Kikuchi,et al.  Discrimination of folding types of globular proteins based on average distance maps constructed from their sequences , 1993, Journal of protein chemistry.

[5]  N. K. Rogers The Role of Electrostatic Interactions in the Structure of Globular Proteins , 1989 .

[6]  K C Chou,et al.  An analysis of protein folding type prediction by seed-propagated sampling and jackknife test , 1995, Journal of protein chemistry.

[7]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[8]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[9]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[10]  H A Scheraga,et al.  Origin of the right-handed twist of beta-sheets of poly(LVal) chains. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Peter A. Kollman,et al.  AMBER: Assisted model building with energy refinement. A general program for modeling molecules and their interactions , 1981 .

[12]  Kuo-Chen Chou,et al.  Energy of stabilization of the right-handed βαβ crossover in proteins☆ , 1989 .

[13]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[14]  H. Scheraga,et al.  Effect of amino acid composition on the twist and the relative stability of parallel and antiparallel .beta.-sheets , 1983 .

[15]  Y. Okamoto,et al.  A prediction of tertiary structures of peptide by the Monte Carlo simulated annealing method. , 1989, Protein engineering.

[16]  Harold A. Scheraga,et al.  Calculations of Conformations of Polypeptides , 1968 .

[17]  Arnold T. Hagler,et al.  The Role of Energy Minimization in Simulation Strategies of Biomolecular Systems , 1989 .

[18]  Scott R. Presnell,et al.  Origins of structural diversity within sequentially identical hexapeptides , 1993, Protein science : a publication of the Protein Society.

[19]  G. Deléage,et al.  Use of Class Prediction to Improve Protein Secondary Structure Prediction , 1989 .

[20]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[21]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[22]  K C Chou,et al.  Protein folding classes: a geometric interpretation of the amino acid composition of globular proteins. , 1994, Protein engineering.

[23]  P. Y. Chou,et al.  Prediction of Protein Structural Classes from Amino Acid Compositions , 1989 .

[24]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[25]  S H Kim,et al.  Predicting protein secondary structure content. A tandem neural network approach. , 1992, Journal of molecular biology.

[26]  M. Levitt Protein folding by restrained energy minimization and molecular dynamics. , 1983, Journal of molecular biology.

[27]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[28]  J. Richardson,et al.  Principles and Patterns of Protein Conformation , 1989 .

[29]  G M Maggiora,et al.  A heuristic approach to predicting the tertiary structure of bovine somatotropin. , 1991, Biochemistry.

[30]  S H Kim,et al.  Prediction of protein folding class from amino acid composition , 1993, Proteins.

[31]  F E Cohen,et al.  Prediction of the three‐dimensional structure of human growth hormone , 1987, Proteins.

[32]  D. Connelly,et al.  Cross‐validation of protein structural class prediction using statistical clustering and neural networks , 1993, Protein science : a publication of the Protein Society.

[33]  S. Wilson,et al.  Applications of simulated annealing to peptides , 1990, Biopolymers.

[34]  Kuo-Chen Chou,et al.  A new approach to predicting protein folding types , 1993, Journal of protein chemistry.

[35]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[36]  K C Chou,et al.  Monte Carlo simulation studies on the prediction of protein folding types from amino acid composition. , 1992, Biophysical journal.

[37]  Terry P. Lybrand,et al.  Protein Stability and Function , 1989 .

[38]  G. Fasman The Development of the Prediction of Protein Structure , 1989 .

[39]  M K Gilson,et al.  Energetics of charge–charge interactions in proteins , 1988, Proteins.

[40]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[41]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[42]  G M Maggiora,et al.  An energy‐based approach to packing the 7‐helix bundle of bacteriorhodopsin , 1992, Protein science : a publication of the Protein Society.

[43]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[44]  K. Chou,et al.  Energy-optimized structure of antifreeze protein and its binding mechanism. , 1992, Journal of molecular biology.