Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments.

The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures alpha-helix, beta-strand, and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and the characteristic periodicity in alpha-helices. Since these single-structure networks all have less than 600 adjustable weights, overfitting is avoided. To obtain a three-state prediction of alpha-helix, beta-strand, or coil, ensembles of single-structure networks are combined with another neural network. This method gives an overall prediction accuracy of 66.3% when using 7-fold cross-validation on a database of 126 nonhomologous globular proteins. Applying the method to multiple sequence alignments of homologous proteins increases the prediction accuracy significantly to 71.3% with corresponding Matthew's correlation coefficients C alpha = 0.59, C beta = 0.52, and Cc = 0.50. More than 72% of the residues in the database are predicted with an accuracy of 80%. It is shown that the network outputs can be interpreted as estimated probabilities of correct prediction, and, therefore, these numbers indicate which residues are predicted with high confidence.

[1]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[2]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[3]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[6]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[7]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[8]  J. Garnier,et al.  Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. , 1988, Biochimica et biophysica acta.

[9]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[10]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[11]  S. Brunak,et al.  Protein secondary structure and homology by neural networks The α‐helices in rhodopsin , 1988 .

[12]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[13]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[15]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[16]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[17]  G. Fasman The Prediction of the Secondary Structure of Proteins , 1990 .

[18]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[19]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[20]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[21]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[22]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[23]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[24]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[25]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[26]  Limits on alpha-helix prediction with neural network models. , 1992, Proteins.

[27]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[29]  G. Barton,et al.  The limits of protein secondary structure prediction accuracy from multiple sequence alignment. , 1993, Journal of molecular biology.

[30]  P. Argos,et al.  Quantification of secondary structure prediction improvement using multiple alignments. , 1993, Protein engineering.

[31]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[32]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[33]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[34]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[35]  C. Chothia,et al.  Volume changes in protein evolution. , 1994, Journal of molecular biology.

[36]  Anders Krogh,et al.  Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA , 1995, ISMB.

[37]  David S. Touretzky,et al.  Learning with Ensembles: How Over--tting Can Be Useful , 1996 .