Protein Secondary Structure Prediction with Bidirectional Recurrent Neural Nets: Can Weight Updating for Each Residue Enhance Performance?

Successful protein secondary structure prediction is an important step towards modelling protein 3D structure, with several practical applications. Even though in the last four decades several PSSP algorithms have been proposed, we are far from being accurate. The Bidirectional Recurrent Neural Network (BRNN) architecture of Baldi et al. [1] is currently considered as one of the optimal computational neural network type architectures for addressing the problem. In this paper, we implement the same BRNN architecture, but we use a modified training procedure. More specifically, our aim is to identify the effect of the contribution of local versus global information, by varying the length of the segment on which the Recurrent Neural Networks operate for each residue position considered. For training the network, the backpropagation learning algorithm with an online training procedure is used, where the weight updates occur for every amino acid, as opposed to Baldi et al. [1], where the weight updates are applied after the presentation of the entire protein. Our results with a single BRNN are better than Baldi et al. [1] by three percentage points (Q3) and comparable to results of [1] when they use an ensemble of 6 BRNNs. In addition, our results improve even further when sequence-to-structure output is filtered in a post-processing step, with a novel Hidden Markov Model-based approach.

[1]  Narendra S. Chaudhari,et al.  Cascaded Bidirectional Recurrent Neural Networks for Protein Secondary Structure Prediction , 2007, IEEE ACM Trans. Comput. Biol. Bioinform..

[2]  Volker A. Eyrich,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[3]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[5]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[6]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[7]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[8]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[9]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  A. Yonath,et al.  From peptide‐bond formation to cotranslational folding: dynamic, regulatory and evolutionary aspects , 2005, FEBS letters.

[12]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[13]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[14]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[15]  C. Sander,et al.  The HSSP database of protein structure-sequence alignments. , 1994, Nucleic acids research.

[16]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[17]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[18]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[19]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.