From Interatomic Distances to Protein Tertiary Structures with a Deep Convolutional Neural Network

Elucidating biologically-active protein structures remains a daunting task both in the wet and dry laboratory, and many proteins lack structural characterization. This lack of knowledge continues to motivate the development of computational methods for protein structure prediction. Methods are diverse in their approaches, and recent efforts have debuted deep learning-based methods for various sub-problems within the larger problem of protein structure prediction. In this paper, we focus on such a sub-problem, the reconstruction of three-dimensional structures consistent with given inter-atomic distances. Inspired by a recent architecture put forward in the larger context of generative frameworks, we design and evaluate a deep convolutional network model on experimentally- and computationally-obtained tertiary structures. Comparison with convex and stochastic optimization-based methods shows that the deep model is faster and similarly or more accurate, opening up several venues of further research to advance the larger problem of protein structure prediction.

[1]  Namrata Anand,et al.  Generative modeling for protein structures , 2018, NeurIPS.

[2]  Z. Luthey-Schulten,et al.  Ab initio protein structure prediction. , 2002, Current opinion in structural biology.

[3]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[4]  Philip Bachman,et al.  Calibrating Energy-based Generative Adversarial Networks , 2017, ICLR.

[5]  R. Nussinov,et al.  The role of dynamic conformational ensembles in biomolecular recognition. , 2009, Nature chemical biology.

[6]  Andrzej Kloczkowski,et al.  Distance matrix-based approach to protein structure prediction , 2009, Journal of Structural and Functional Genomics.

[7]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[8]  Li Yu,et al.  Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[10]  G Chelvanayagam,et al.  A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure. , 1998, Folding & design.

[11]  Dustin Tran,et al.  Hierarchical Implicit Models and Likelihood-Free Variational Inference , 2017, NIPS.

[12]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[13]  Namrata Anand,et al.  Fully differentiable full-atom protein backbone generation , 2019, DGS@ICLR.

[14]  B. Rost,et al.  Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.

[15]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[16]  Otto Hudecz,et al.  Structural prediction of protein models using distance restraints derived from cross-linking mass spectrometry data , 2018, Nature Protocols.

[17]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[18]  Amarda Shehu,et al.  Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction , 2019, BMC Bioinformatics.

[19]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[20]  D. Boehr,et al.  How Do Proteins Interact? , 2008, Science.

[21]  Stefano Ermon,et al.  A-NICE-MC: Adversarial Training for MCMC , 2017, NIPS.

[22]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  Gianluca Pollastri,et al.  Deep learning methods in protein structure prediction , 2020, Computational and structural biotechnology journal.

[25]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[26]  Nasrin Akhter,et al.  From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction , 2018, Molecules.

[27]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[28]  Sari Sabban,et al.  RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network , 2019, bioRxiv.

[29]  Ruth Nussinov,et al.  Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics , 2016, PLoS Comput. Biol..

[30]  Arne Elofsson,et al.  Methods for estimation of model accuracy in CASP12 , 2017, bioRxiv.

[31]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[32]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.