Predicting protein secondary structure using neural net and statistical methods.

A comparison of neural network methods and Bayesian statistical methods is presented for prediction of the secondary structure of proteins given their primary sequence. The Bayesian method makes the unphysical assumption that the probability of an amino acid occurring in each position in the protein is independent of the amino acids occurring elsewhere. However, we find the predictive accuracy of the Bayesian method to be only minimally less than the accuracy of the most sophisticated methods used to date. We present the relationship of neural network methods to Bayesian statistical methods and show that, in principle, neural methods offer considerable power, although apparently they are not particularly useful for this problem. In the process, we derive a neural formalism in which the output neurons directly represent the conditional probabilities of structure class. The probabilistic formalism allows introduction of a new objective function, the mutual information, which translates the notion of correlation as a measure of predictive accuracy into a useful training measure. Although a similar accuracy to other approaches (utilizing a mean-square error) is achieved using this new measure, the accuracy on the training set is significantly and tantalizingly higher, even though the number of adjustable parameters remains the same. The mutual information measure predicts a greater fraction of helix and sheet structures correctly than the mean-square error measure, at the expense of coil accuracy, precisely as it was designed to do. By combining the two objective functions, we obtain a marginally improved accuracy of 64.4%, with Matthews coefficients C alpha, C beta and Ccoil of 0.40, 0.32 and 0.42, respectively. However, since all methods to date perform only slightly better than the Bayes algorithm, which entails the drastic assumption of independence of amino acids, one is forced to conclude that little progress has been made on this problem, despite the application of a variety of sophisticated algorithms such as neural networks, and that further advances will require a better understanding of the relevant biophysics.

[1]  Day Ne,et al.  A GENERAL MAXIMUM LIKELIHOOD DISCRIMINANT , 1967 .

[2]  John J. Hopfield,et al.  A protein structure predictor based on an energy model with learned parameters , 1990 .

[3]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[4]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[5]  Harold A. Scheraga,et al.  Pattern recognition in the prediction of protein structure. II. Chain conformation from a probability‐directed search procedure , 1989 .

[6]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[7]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[8]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[9]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[10]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Malcolm J. McGregor,et al.  Prediction of β-turns in proteins using neural networks , 1989 .

[12]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[13]  S. Brunak,et al.  Protein secondary structure and homology by neural networks The α‐helices in rhodopsin , 1988 .

[14]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[15]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .