Improvement of protein secondary structure prediction using binary word encoding

We propose a binary word encoding to improve the protein secondary structure prediction. A binary word encoding encodes a local amino acid sequence to a binary word, which consists of 0 or 1. We use an encoding function to map an amino acid to 0 or 1. Using the binary word encoding, we can statistically extract the multiresidue information, which depends on more than one residue. We combine the binary word encoding with the GOR method, its modified version, which shows better accuracy, and the neural network method. The binary word encoding improves the accuracy of GOR by 2.8%. We obtain similar improvement when we combine this with the modified GOR method and the neural network method. When we use multiple sequence alignment data, the binary word encoding similarly improves the accuracy. The accuracy of our best combined method is 68.2%. In this paper, we only show improvement of the GOR and neural network method, we cannot say that the encoding improves the other methods. But the improvement by the encoding suggests that the multiresidue interaction affects the formation of secondary structure. In addition, we find that the optimal encoding function obtained by the simulated annealing method relates to non‐polarity. This means that nonpolarity is important to the multiresidue interaction. Proteins 27:36–46 © 1997 Wiley‐Liss, Inc.

[1]  M. Schiffer,et al.  Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. , 1967, Biophysical journal.

[2]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[3]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[4]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[5]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[6]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[8]  K Nishikawa,et al.  Amino acid sequence homology applied to the prediction of protein secondary structures, and joint prediction with existing methods. , 1986, Biochimica et biophysica acta.

[9]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[12]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[13]  K. Nagano,et al.  Triplet information in helix prediction applied to the analysis of super-secondary structures. , 1977, Journal of molecular biology.

[14]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[16]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[17]  C. Branden,et al.  Introduction to protein structure , 1991 .

[18]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[19]  Norman Abramson,et al.  Information theory and coding , 1963 .

[20]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[21]  B. Robson,et al.  Analysis of code relating sequences to conformation in globular prtoeins. Theory and application of expected information. , 1974, The Biochemical journal.

[22]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.