Protein Decoy Generation via Adaptive Stochastic Optimization for Protein Structure Determination

Many regions of the protein universe remain inaccessible by wet-laboratory or homology modeling methods. Elucidating these regions necessitates structure determination in silico. Protein structure determination in the absence of a structural template remains a challenging task with two core problems, known as decoy generation and decoy selection. In this paper, we address the problem of decoy generation, which inherently involves exploring the unknown, vast, and high-dimensional structure space of a given amino-acid sequence in the presence of a finite computational budget for relevant structures. Leveraging a stochastic optimization framework, we first demonstrate how selection pressure can be employed to control the trade-off between exploration and exploitation. Moreover, we then propose a novel algorithm that tunes its behavior towards exploration or exploitation as needed via an adaptive selection mechanism. We present a thorough evaluation on 30 protein targets in a comparative setting, where we compare the proposed adaptive algorithm to state-of-the-art algorithms that include the top ten groups in the two recent CASP competitions. The results show that the proposed algorithm is not only competitive against several of these groups, but it additionally outperforms several of them on many targets, suggesting that adaptive stochastic optimization is a promising framework for decoy generation.

[1]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[2]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[3]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Boehr,et al.  How Do Proteins Interact? , 2008, Science.

[5]  B. Rost,et al.  Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.

[6]  Amarda Shehu,et al.  Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction , 2019, BMC Bioinformatics.

[7]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[8]  Amarda Shehu,et al.  Equipping Decoy Generation Algorithms for Template-free Protein Structure Prediction with Maps of the Protein Conformation Space , 2019 .

[9]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[10]  Amarda Shehu,et al.  Using Sequence-Predicted Contacts to Guide Template-free Protein Structure Prediction , 2019, BCB.

[11]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[12]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, bioRxiv.

[13]  Brian S. Olson,et al.  Multi-Objective Optimization Techniques for Conformational Sampling in Template-Free Protein Structure Prediction , 2014 .

[14]  Pushmeet Kohli,et al.  Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13) , 2019, Proteins.

[15]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[16]  Z. Luthey-Schulten,et al.  Ab initio protein structure prediction. , 2002, Current opinion in structural biology.

[17]  Amarda Shehu,et al.  Conformational Search for the Protein Native State , 2010 .

[18]  Amarda Shehu,et al.  Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering , 2020, Molecules.

[19]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[20]  Amarda Shehu,et al.  Decoy Ensemble Reduction in Template-free Protein Structure Prediction , 2019, BCB.

[21]  Amarda Shehu,et al.  Building maps of protein structure spaces in template-free protein structure prediction , 2019, J. Bioinform. Comput. Biol..

[22]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[23]  Kenneth A. De Jong,et al.  Using subpopulation EAs to map molecular structure landscapes , 2019, GECCO.

[24]  Yang Zhang,et al.  Template‐based and free modeling of I‐TASSER and QUARK pipelines using predicted contact maps in CASP12 , 2018, Proteins.

[25]  V. Cutello,et al.  A multi-objective evolutionary approach to the protein structure prediction problem , 2006, Journal of The Royal Society Interface.

[26]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[27]  Amarda Shehu A Review of Evolutionary Algorithms for Computing Functional Conformations of Protein Molecules , 2015 .