Building maps of protein structure spaces in template-free protein structure prediction

An important goal in template-free protein structure prediction is how to control the quality of computed tertiary structures of a target amino-acid sequence. Despite great advances in algorithmic research, given the size, dimensionality, and inherent characteristics of the protein structure space, this task remains exceptionally challenging. It is current practice to aim to generate as many structures as can be afforded so as to increase the likelihood that some of them will reside near the sought but unknown biologically-active/native structure. When operating within a given computational budget, this is impractical and uninformed by any metrics of interest. In this paper, we propose instead to equip algorithms that generate tertiary structures, also known as decoy generation algorithms, with memory of the protein structure space that they explore. Specifically, we propose an evolving, granularity-controllable map of the protein structure space that makes use of low-dimensional representations of protein structures. Evaluations on diverse target sequences that include recent hard CASP targets show that drastic reductions in storage can be made without sacrificing decoy quality. The presented results make the case that integrating a map of the protein structure space is a promising mechanism to enhance decoy generation algorithms in template-free protein structure prediction.

[1]  Ruth Nussinov,et al.  Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics , 2016, PLoS Comput. Biol..

[2]  Amarda Shehu,et al.  Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab Initio Protein Structure Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Amarda Shehu,et al.  Equipping Decoy Generation Algorithms for Template-free Protein Structure Prediction with Maps of the Protein Conformation Space , 2019 .

[4]  Nasrin Akhter,et al.  An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction , 2018, Comput..

[5]  Li Yu,et al.  Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[7]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Amarda Shehu,et al.  An Ab-initio tree-based exploration to enhance sampling of low-energy protein conformations , 2009, Robotics: Science and Systems.

[9]  D. Boehr,et al.  How Do Proteins Interact? , 2008, Science.

[10]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[11]  Amarda Shehu A Review of Evolutionary Algorithms for Computing Functional Conformations of Protein Molecules , 2015 .

[12]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[13]  Amarda Shehu,et al.  Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction , 2019, BMC Bioinformatics.

[14]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[15]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[16]  Nasrin Akhter,et al.  From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction , 2018, Molecules.

[17]  V. de Crécy-Lagard,et al.  Mining high-throughput experimental data to link gene and function. , 2011, Trends in biotechnology.

[18]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[19]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[20]  Amarda Shehu,et al.  Elucidating the ensemble of functionally-relevant transitions in protein systems with a robotics-inspired method , 2013, BMC Structural Biology.

[21]  Xiaogen Zhou,et al.  Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Rhiju Das,et al.  Four Small Puzzles That Rosetta Doesn't Solve , 2011, PloS one.

[23]  Erion Plaku,et al.  A Survey of Computational Treatments of Biomolecules by Robotics-Inspired Methods Modeling Equilibrium Structure and Dynamic , 2016, J. Artif. Intell. Res..

[24]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[25]  Michael Wilde,et al.  Protein structure prediction enhanced with evolutionary diversity: SPEED , 2010, Protein science : a publication of the Protein Society.

[26]  S. McNicholas,et al.  Presenting your structures: the CCP4mg molecular-graphics software , 2011, Acta crystallographica. Section D, Biological crystallography.

[27]  Michael Levitt,et al.  Generalized ensemble methods for de novo structure prediction , 2009, Proceedings of the National Academy of Sciences.

[28]  Kenneth A. De Jong,et al.  Off-lattice protein structure prediction with homologous crossover , 2013, GECCO '13.