Prediction of the helix/strand content of globular proteins based on their primary sequences.

An improved multiple linear regression method has been proposed to predict the content of alpha-helix and beta-strand of a globular protein based on its primary sequence. The amino acid composition and the auto-correlation functions based on the hydrophobicity profile of the primary sequence have been taken into account in the algorithm. The resubstitution test shows that the average absolute errors are 0.077 and 0.073 with the standard deviations 0.059 and 0.057 for the prediction of the content of alpha-helix and beta-strand, respectively. A stringent cross-validation test, i.e., the jackknife test, shows that the average absolute errors are 0.087 and 0.081 with the standard deviations 0.067 and 0.065 for the prediction of the content of alpha-helix and beta-strand, respectively. Both tests indicate the self-consistency and the extrapolating effectiveness of the new algorithm. This greatly improves on previous results (Eisenhaber,F., Imperiale,F., Argos,P. and Frommel,C., 1996, Proteins, 25, 157-168). Compared with other methods currently available, our method has the merits of simplicity and ease-of-use as well as a higher prediction accuracy. The only input of the method is the primary sequence of the query protein to be predicted. The program is available on request via e-mail: ctzhang@tju.edu.cn.

[1]  C. Tanford Macromolecules , 1994, Nature.

[2]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[3]  F. Young Biochemistry , 1955, The Indian Medical Gazette.