Prediction of protein structural classes by a new measure of information discrepancy

Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.

[1]  D. Connelly,et al.  Cross‐validation of protein structural class prediction using statistical clustering and neural networks , 1993, Protein science : a publication of the Protein Society.

[2]  Weiwu Fang,et al.  Disagreement degree of multi-person judgements in an additive structure , 1994 .

[3]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[4]  Weiwu Fang On a Global Optimization Problem in the Study of Information Discrepancy , 1997, J. Glob. Optim..

[5]  M M Gromiha,et al.  Protein secondary structure prediction in different structural classes. , 1998, Protein engineering.

[6]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[7]  Kuo-Chen Chou,et al.  A new approach to predicting protein folding types , 1993, Journal of protein chemistry.

[8]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[9]  Z X Wang The prediction accuracy for protein structural class by the component‐coupled method is around 60% , 2001, Proteins.

[10]  Zheng Yuan,et al.  How good is prediction of protein structural class by the component‐coupled method? , 2000, Proteins.

[11]  Weiwu Fang The characterization of a measure of information discrepancy , 2000, Inf. Sci..

[12]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[13]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[14]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[15]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[16]  R Zhang,et al.  A quadratic discriminant analysis of protein structure classification based on the Helix/Strand content. , 1999, Journal of theoretical biology.

[17]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[18]  S H Kim,et al.  Prediction of protein folding class from amino acid composition , 1993, Proteins.

[19]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[20]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[21]  C. Zhang,et al.  Prediction of protein (domain) structural classes based on amino-acid index. , 1999, European journal of biochemistry.

[22]  K. Chou,et al.  Predicting protein structural classes from amino acid composition: application of fuzzy clustering. , 1995, Protein engineering.

[23]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[24]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[25]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[26]  Yu-Dong Cai,et al.  Is it a paradox or misinterpretation? , 2001, Proteins.

[27]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[28]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[29]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[30]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[31]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[32]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[33]  A new quantitative criterion to distinguish between α/β and α+β proteins (domains) , 1998 .

[34]  I. Grigoriev,et al.  Detection of protein fold similarity based on correlation of amino acid properties. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Runsheng Chen,et al.  Gene's Functional Arrangement as a Measure of thePhylogenetic Relationships of Microorganisms , 2002, Journal of biological physics.

[36]  R. Zhang,et al.  A new criterion to classify globular proteins based on their secondary structure contents , 1998, Bioinform..

[37]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[38]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.