Variational Autoencoders for Protein Structure Prediction

The universe of protein structures contains many dark regions beyond the reach of experimental techniques. Yet, knowledge of the tertiary structure(s) that a protein employs to interact with partners in the cell is critical to understanding its biological function(s) and dysfunction(s). Great progress has been made in silico by methods that generate structures as part of an optimization. Recently, generative models based on neural networks are being debuted for generating protein structures. There is typically limited to showing that some generated structures are credible. In this paper, we go beyond this objective. We design variational autoencoders and evaluate whether they can replace existing, established methods. We evaluate various architectures via rigorous metrics in comparison with the popular Rosetta framework. The presented results are promising and show that once seeded with sufficient, physically-realistic structures, variational autoencoders are efficient models for generating realistic tertiary structures.

[1]  Namrata Anand,et al.  Fully differentiable full-atom protein backbone generation , 2019, DGS@ICLR.

[2]  Li Yu,et al.  Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  B. Rost,et al.  Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.

[4]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[5]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[6]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[7]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[8]  Sari Sabban,et al.  RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network , 2019, bioRxiv.

[9]  Arthur Gretton,et al.  A Test of Relative Similarity For Model Selection in Generative Models , 2015, ICLR.

[10]  Amarda Shehu,et al.  Learning Reduced Latent Representations of Protein Structure Data , 2019, BCB.

[11]  Yang Zhang,et al.  Ensembling multiple raw coevolutionary features with deep residual neural networks for contact‐map prediction in CASP13 , 2019, Proteins.

[12]  Amarda Shehu,et al.  Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection , 2020, Molecules.

[13]  Namrata Anand,et al.  Generative modeling for protein structures , 2018, NeurIPS.

[14]  Bonnie Berger,et al.  Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks , 2017, Cell systems.

[15]  Amarda Shehu,et al.  Using Sequence-Predicted Contacts to Guide Template-free Protein Structure Prediction , 2019, BCB.

[16]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[17]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[18]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[19]  Amarda Shehu,et al.  Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration , 2010, Int. J. Robotics Res..

[20]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Gianluca Pollastri,et al.  Deep learning methods in protein structure prediction , 2020, Computational and structural biotechnology journal.

[23]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[24]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[25]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[26]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[27]  Mirco Michel,et al.  PconsC4: fast, accurate and hassle-free contact predictions , 2019, Bioinform..

[28]  Ruth Nussinov,et al.  Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics , 2016, PLoS Comput. Biol..

[29]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[30]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[31]  Giuseppe Tradigo,et al.  Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks , 2014, BMC Bioinformatics.

[32]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.