Fast design of arbitrary length loops in proteins using InteractiveRosetta

BackgroundWith increasing interest in ab initio protein design, there is a desire to be able to fully explore the design space of insertions and deletions. Nature inserts and deletes residues to optimize energy and function, but allowing variable length indels in the context of an interactive protein design session presents challenges with regard to speed and accuracy.ResultsHere we present a new module (INDEL) for InteractiveRosetta which allows the user to specify a range of lengths for a desired indel, and which returns a set of low energy backbones in a matter of seconds. To make the loop search fast, loop anchor points are geometrically hashed using C α-C α and C β-C β distances, and the hash is mapped to start and end points in a pre-compiled random access file of non-redundant, protein backbone coordinates. Loops with superposable anchors are filtered for collisions and returned to InteractiveRosetta as poly-alanine for display and selective incorporation into the design template. Sidechains can then be added using RosettaDesign tools.ConclusionsINDEL was able to find viable loops in 100% of 500 attempts for all lengths from 3 to 20 residues. INDEL has been applied to the task of designing a domain-swapping loop for T7-endonuclease I, changing its specificity from Holliday junctions to paranemic crossover (PX) DNA.

[1]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[2]  Amelie Stein,et al.  Improvements to Robotics-Inspired Conformational Sampling in Rosetta , 2013, PloS one.

[3]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[4]  Pablo Gainza,et al.  Osprey: Protein Design with Ensembles, Flexibility, and Provable Algorithms , 2022 .

[5]  D. Lilley,et al.  Crystal structure of the Holliday junction resolving enzyme T7 endonuclease I , 2001, Nature Structural Biology.

[6]  David Baker,et al.  Efficient sampling of protein conformational space using fast loop building and batch minimization on highly parallel computers , 2012, J. Comput. Chem..

[7]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[8]  D. Lilley,et al.  The structural basis of Holliday junction resolution by T7 endonuclease I , 2007, Nature.

[9]  Kai Zhu,et al.  Toward better refinement of comparative models: Predicting loops in inexact environments , 2008, Proteins.

[10]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[11]  Chaok Seok,et al.  GalaxyWEB server for protein structure prediction and refinement , 2012, Nucleic Acids Res..

[12]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[13]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[14]  R. Friesner,et al.  Long loop prediction using the protein local optimization program , 2006, Proteins.

[15]  Dachuan Zhang,et al.  MMDB and VAST+: tracking structural similarities between macromolecular complexes , 2013, Nucleic Acids Res..

[16]  P. Bradley,et al.  High-resolution structure prediction and the crystallographic phase problem , 2007, Nature.

[17]  Chaok Seok,et al.  The FALC-Loop web server for protein loop modeling , 2011, Nucleic Acids Res..

[18]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.

[19]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[20]  Bashir Mohammed Abubakar,et al.  The Functional Complexity of [NiFe] Hydrogenases in Sulfate Reducing Bacteria (Genus; Desulforvibrio spp) , 2013 .

[21]  Ian W. Davis,et al.  The backrub motion: how protein backbone shrugs when a sidechain dances. , 2006, Structure.

[22]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[23]  N. Seeman,et al.  Paranemic crossover DNA: a generalized Holliday structure with applications in nanotechnology. , 2004, Journal of the American Chemical Society.

[24]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[25]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[26]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[27]  Pablo Gainza,et al.  Algorithms for protein design. , 2016, Current opinion in structural biology.

[28]  Yi Liu,et al.  RosettaDesign server for protein design , 2006, Nucleic Acids Res..

[29]  Bruce Randall Donald,et al.  Protein Design Using Continuous Rotamers , 2012, PLoS Comput. Biol..

[30]  Christopher Bystroff,et al.  InteractiveROSETTA: a graphical user interface for the PyRosetta protein modeling suite , 2015, Bioinform..

[31]  Christopher Bystroff,et al.  Green‐lighting green fluorescent protein: Faster and more efficient folding by eliminating a cis–trans peptide isomerization event , 2014, Protein science : a publication of the Protein Society.