Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure

Background Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. Results The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta’s standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. Conclusions Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.

[1]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.

[2]  Jilong Li,et al.  FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling , 2016, Bioinform..

[3]  Kam Y. J. Zhang,et al.  A Probabilistic Fragment-Based Protein Structure Prediction Algorithm , 2012, PloS one.

[4]  Xin Deng,et al.  Recursive protein modeling: A divide and conquer strategy for protein structure prediction and its case study in CASP9 , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[5]  David T Jones,et al.  Recent developments in deep learning applied to protein structure prediction , 2019, Proteins.

[6]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[7]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[8]  Thomas Schiex,et al.  Balancing exploration and exploitation in population‐based sampling improves fragment‐based de novo protein structure prediction , 2017, Proteins.

[9]  Shoji Takada,et al.  SimFold energy function for de novo protein structure prediction: Consensus with Rosetta , 2005, Proteins.

[10]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[11]  Oliver Brock,et al.  Guiding conformation space search with an all‐atom energy potential , 2008, Proteins.

[12]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[13]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[14]  Brian Kuhlman,et al.  Engineering a protein–protein interface using a computationally designed library , 2010, Proceedings of the National Academy of Sciences.

[15]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[16]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[17]  R. Zwanzig,et al.  Levinthal's paradox. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Yang Zhang Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10 , 2014, Proteins.

[19]  Oliver Brock,et al.  Improving protein structure prediction with model-based search , 2005, ISMB.

[20]  Brian D. Weitzner,et al.  De novo design of potent and selective mimics of IL-2 and IL-15 , 2019, Nature.

[21]  Jeffrey J. Gray,et al.  A generalized approach to sampling backbone conformations with RosettaDock for CAPRI rounds 13–19 , 2010, Proteins.

[22]  Torsten Schwede,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.

[23]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[24]  Kuldip K. Paliwal,et al.  Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..

[25]  A. Lesk,et al.  Conformations of immunoglobulin hypervariable regions , 1989, Nature.

[26]  András Fiser,et al.  Saturating representation of loop conformational fragments in structure databanks , 2006, BMC Structural Biology.

[27]  Shuai Cheng Li,et al.  Designing succinct structural alphabets , 2008, ISMB.

[28]  Tom Lenaerts,et al.  Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments , 2008, PLoS Comput. Biol..

[29]  Samuel L. DeLuca,et al.  Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You , 2010, Biochemistry.

[30]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[31]  Glennie Helles,et al.  A comparative study of the reported performance of ab initio protein structure prediction algorithms , 2008, Journal of The Royal Society Interface.

[32]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[33]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[34]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[35]  Mirco Michel,et al.  Large-scale structure prediction by improved contact predictions and model quality assessment , 2017, bioRxiv.

[36]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[37]  Jaime Prilusky,et al.  Assessment of CASP8 structure predictions for template free targets , 2009, Proteins.

[38]  Julian Lee,et al.  PROTEINS: Structure, Function, and Bioinformatics 56:704–714 (2004) Prediction of Protein Tertiary Structure Using PROFESY, a Novel Method Based on Fragment Assembly and , 2022 .

[39]  Jean-Christophe Nebel,et al.  Customised fragments libraries for protein structure prediction based on structural class annotations , 2015, BMC Bioinformatics.

[40]  Laurent Emmanuel Dardenne,et al.  Critical Features of Fragment Libraries for Protein Structure Prediction , 2017, PloS one.

[41]  Amarda Shehu,et al.  Guiding Probabilistic Search of the protein conformational Space with Structural Profiles , 2012, J. Bioinform. Comput. Biol..

[42]  François Stricher,et al.  BriX: a database of protein building blocks for structural analysis, modeling and design , 2010, Nucleic Acids Res..

[43]  Jean-Christophe Nebel,et al.  Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta. , 2017, Protein and peptide letters.

[44]  B. L. Sibanda,et al.  β-Hairpin families in globular proteins , 1985, Nature.

[45]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[46]  L. Serrano,et al.  Protein-peptide interactions adopt the same structural motifs as monomeric protein folds. , 2009, Structure.

[47]  Pierre Tufféry,et al.  Assessing 3D scores for protein structure fragment mining , 2010 .

[48]  D T Jones,et al.  Prediction of novel and analogous folds using fragment assembly and fold recognition , 2005, Proteins.

[49]  Markus Christen,et al.  On searching in, sampling of, and dynamically moving through conformational space of biomolecular systems: A review , 2008, J. Comput. Chem..

[50]  David Baker,et al.  Centenary Award and Sir Frederick Gowland Hopkins Memorial Lecture. Protein folding, structure prediction and design. , 2014, Biochemical Society transactions.

[51]  David B. Dahl,et al.  Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction , 2014, PloS one.

[52]  Michael I. Jordan,et al.  Feature space resampling for protein conformational search , 2010, Proteins.

[53]  Robert A. Langan,et al.  Programmable design of orthogonal protein heterodimers , 2019 .

[54]  Prasanna R Kolatkar,et al.  Assessment of CASP7 structure predictions for template free targets , 2007, Proteins.

[55]  Yuxing Liao,et al.  CASP9 assessment of free modeling target predictions , 2011, Proteins.

[56]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[57]  A. Kolinski,et al.  Coarse-Grained Protein Models and Their Applications. , 2016, Chemical reviews.

[58]  Charlotte M. Deane,et al.  How long is a piece of loop? , 2013, PeerJ.

[59]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Jad F. Abbass,et al.  Ab Initio Protein Structure Prediction: Methods and challenges , 2013 .

[61]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[62]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[63]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[64]  Marco Agostino Deriu,et al.  A Hydrophobic Gold Surface Triggers Misfolding and Aggregation of the Amyloidogenic Josephin Domain in Monomeric Form, While Leaving the Oligomers Unaffected , 2013, PloS one.

[65]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[66]  Shaowen Yao,et al.  Protein secondary structure prediction: A survey of the state of the art. , 2017, Journal of molecular graphics & modelling.

[67]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[68]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[69]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[70]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[71]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[72]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[73]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[74]  Liam J McGuffin,et al.  Assembling novel protein folds from super‐secondary structural fragments , 2003, Proteins.

[75]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[76]  Kam Y. J. Zhang,et al.  Efficient Sampling in Fragment-Based Protein Structure Prediction Using an Estimation of Distribution Algorithm , 2013, PloS one.

[77]  Jens Meiler,et al.  Rosetta Ligand docking with flexible XML protocols. , 2012, Methods in molecular biology.

[78]  Gregory R Bowman,et al.  FAST Conformational Searches by Balancing Exploration/Exploitation Trade-Offs. , 2015, Journal of chemical theory and computation.

[79]  Kevin Karplus,et al.  SAM-T08, HMM-based protein structure prediction , 2009, Nucleic Acids Res..

[80]  J. Kwasigroch,et al.  A global taxonomy of loops in globular proteins. , 1996, Journal of molecular biology.

[81]  David Baker,et al.  Unintended specificity of an engineered ligand-binding protein facilitated by unpredicted plasticity of the protein fold. , 2018, Protein engineering, design & selection : PEDS.

[82]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[83]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[84]  C A Floudas,et al.  ASTRO-FOLD 2.0: an Enhanced Framework for Protein Structure Prediction. , 2012, AIChE journal. American Institute of Chemical Engineers.

[85]  Charlotte M. Deane,et al.  Browsing the SLoop database of structurally classified loops connecting elements of protein secondary structure , 2000, Bioinform..

[86]  Kam Y. J. Zhang,et al.  Improving fragment quality for de novo structure prediction , 2014, Proteins.

[87]  Tong Wang,et al.  LRFragLib: an effective algorithm to identify fragments for de novo protein structure prediction , 2016, Bioinform..

[88]  C. Levinthal Are there pathways for protein folding , 1968 .

[89]  J. Skolnick,et al.  Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding , 2002, Proteins.

[90]  Mario Garza-Fabre,et al.  Improved fragment-based protein structure prediction by redesign of search heuristics , 2018, Scientific Reports.

[91]  C M Dobson,et al.  A Ca2+-binding Chimera of Human Lysozyme and Bovine α-Lactalbumin That Can Form a Molten Globule (*) , 1995, The Journal of Biological Chemistry.

[92]  V. Singh,et al.  Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods , 2018, Scientific Reports.

[93]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[94]  U H Hansmann,et al.  New Monte Carlo algorithms for protein folding. , 1999, Current opinion in structural biology.

[95]  Dan S. Tawfik,et al.  Simple yet functional phosphate-loop proteins , 2018, Proceedings of the National Academy of Sciences.

[96]  Sung-Joon Park,et al.  A study of fragment-based protein structure prediction: biased fragment replacement for searching low-energy conformation. , 2005, Genome informatics. International Conference on Genome Informatics.

[97]  Hongjun Bai,et al.  Assessment of template‐free modeling in CASP10 and ROLL , 2014, Proteins.

[98]  Ken A. Dill,et al.  Accelerating physical simulations of proteins by leveraging external knowledge , 2017, Wiley interdisciplinary reviews. Computational molecular science.

[99]  Julia Handl,et al.  Toward a detailed understanding of search trajectories in fragment assembly approaches to protein structure prediction , 2016, Proteins.

[100]  Jiye Shi,et al.  Building a Better Fragment Library for De Novo Protein Structure Prediction , 2015, PloS one.

[101]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[102]  Yang Zhang,et al.  TASSER-Lite: an automated tool for protein comparative modeling. , 2006, Biophysical journal.

[103]  M. Karplus,et al.  Dynamics of folded proteins , 1977, Nature.

[104]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[105]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[106]  Oded Berger-Tal,et al.  The Exploration-Exploitation Dilemma: A Multidisciplinary Framework , 2014, PloS one.

[107]  T. Blundell,et al.  Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: A database for modeling and prediction , 1996, Protein science : a publication of the Protein Society.

[108]  Jens Meiler,et al.  Simultaneous prediction of protein secondary structure and transmembrane spans , 2013, Proteins.

[109]  Daniel W. Kulp,et al.  Generalized Fragment Picking in Rosetta: Design, Protocols and Applications , 2011, PloS one.