Prediction of protein (domain) structural classes based on amino-acid index.

A protein (domain) is usually classified into one of the following four structural classes: all-alpha, all-beta, alpha/beta and alpha + beta. In this paper, a new formulation is proposed to predict the structural class of a protein (domain) from its primary sequence. Instead of the amino-acid composition used widely in the previous structural class prediction work, the auto-correlation functions based on the profile of amino-acid index along the primary sequence of the query protein (domain) are used for the structural class prediction. Consequently, the overall predictive accuracy is remarkably improved. For the same training database consisting of 359 proteins (domains) and the same component-coupled algorithm [Chou, K.C. & Maggiora, G.M. (1998) Protein Eng. 11, 523-538], the overall predictive accuracy of the new method for the jackknife test is 5-7% higher than the accuracy based only on the amino-acid composition. The overall predictive accuracy finally obtained for the jackknife test is as high as 90.5%, implying that a significant improvement has been achieved by making full use of the information contained in the primary sequence for the class prediction. This improvement depends on the size of the training database, the auto-correlation functions selected and the amino-acid index used. We have found that the amino-acid index proposed by Oobatake and Ooi, i.e. the average nonbonded energy per residue, leads to the optimal predictive result in the case for the database sets studied in this paper. This study may be considered as an alternative step towards making the structural class prediction more practical.

[1]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[2]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[3]  A. Finkelstein,et al.  Theory of protein molecule self‐organization. II. A comparison of calculated thermodynamic parameters of local secondary structures with experiments , 1977, Biopolymers.

[4]  M. Oobatake,et al.  An analysis of non-bonded energy of proteins. , 1977, Journal of theoretical biology.

[5]  J. Meek Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Michael J. Geisow,et al.  Amino acid preferences for secondary structure vary with protein class , 1980 .

[7]  P. Ponnuswamy,et al.  Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. , 1980, Biochimica et biophysica acta.

[8]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[9]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[10]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[11]  R. Hodges,et al.  New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. , 1986, Biochemistry.

[12]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[13]  G Deléage,et al.  An algorithm for protein secondary structure prediction based on class prediction. , 1987, Protein engineering.

[14]  C. DeLisi,et al.  Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. , 1987, Journal of molecular biology.

[15]  F E Cohen,et al.  Prediction of the three‐dimensional structure of human growth hormone , 1987, Proteins.

[16]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[17]  G. Deléage,et al.  Use of Class Prediction to Improve Protein Secondary Structure Prediction , 1989 .

[18]  P. Y. Chou,et al.  Prediction of Protein Structural Classes from Amino Acid Compositions , 1989 .

[19]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[20]  G M Maggiora,et al.  A heuristic approach to predicting the tertiary structure of bovine somatotropin. , 1991, Biochemistry.

[21]  Zhou Genfa,et al.  A weighting method for predicting protein structural class from amino acid composition. , 1992 .

[22]  Stephen Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning , 1992 .

[23]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[24]  Scott R. Presnell,et al.  Origins of structural diversity within sequentially identical hexapeptides , 1993, Protein science : a publication of the Protein Society.

[25]  D. Connelly,et al.  Cross‐validation of protein structural class prediction using statistical clustering and neural networks , 1993, Protein science : a publication of the Protein Society.

[26]  K C Chou,et al.  Protein folding classes: a geometric interpretation of the amino acid composition of globular proteins. , 1994, Protein engineering.

[27]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[28]  K. Chou,et al.  Predicting protein structural classes from amino acid composition: application of fuzzy clustering. , 1995, Protein engineering.

[29]  P. Argos,et al.  Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. , 1995, Critical reviews in biochemistry and molecular biology.

[30]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[31]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[32]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[33]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[34]  P Argos,et al.  Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class , 1996, Proteins.

[35]  Z Zhang,et al.  Prediction of the Secondary Structure Contents of Globular Proteins Based on Three Structural Classes , 1998, Journal of protein chemistry.

[36]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[37]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[38]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[39]  Ming Yan,et al.  Prediction of the helix/strand content of globular proteins based on their primary sequences. , 1998, Protein engineering.