Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNkij, where i and j index the spatial coordinates of the contact map and k indexes "time". The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components.

[1]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[2]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[3]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[4]  Michele Vendruscolo,et al.  Reconstruction of protein structures from a vectorial representation. , 2004, Physical review letters.

[5]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[6]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[7]  Michael Lappe,et al.  Defining an Essence of Structure Determining Residue Contacts in Proteins , 2009, PLoS Comput. Biol..

[8]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[9]  William W. Hsieh,et al.  Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels , 2009 .

[10]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[11]  Osvaldo Graña,et al.  Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 , 2009, Proteins.

[12]  Torgeir R. Hvidsten,et al.  Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts , 2009, Bioinform..

[13]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[14]  Amos Storkey,et al.  Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012 , 2012, ICML.

[15]  Pierre Baldi,et al.  Boolean autoencoders and hypercube clustering complexity , 2012, Designs, Codes and Cryptography.

[16]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[17]  P Fariselli,et al.  Progress in predicting inter‐residue contacts of proteins with neural networks and correlated mutations , 2001, Proteins.

[18]  Krzysztof Fidelis,et al.  CASP9 results compared to those of previous casp experiments , 2011, Proteins.

[19]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[20]  Marc Toussaint,et al.  Trajectory prediction: learning to map situations to robot trajectories , 2009, ICML '09.

[21]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[22]  M. Tress,et al.  Predicted residue–residue contacts can help the scoring of 3D models , 2010, Proteins.

[23]  Piero Fariselli,et al.  FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps , 2008, Bioinform..

[24]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[25]  Burkhard Rost,et al.  PROFcon: novel prediction of long-range contacts , 2005, Bioinform..

[26]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[27]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.