Rapid search for tertiary fragments reveals protein sequence–structure relationships

Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user‐specified root‐mean‐square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open‐source format will enable novel advances in understanding, predicting, and designing protein structure.

[1]  Jie J. Zheng,et al.  PDZ domains and their binding partners: structure, specificity, and modification , 2010, Cell Communication and Signaling.

[2]  Gevorg Grigoryan,et al.  Mining tertiary structural motifs for assessment of designability. , 2013, Methods in enzymology.

[3]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[4]  Yang Zhang,et al.  Template-based structure modeling of protein-protein interactions. , 2014, Current opinion in structural biology.

[5]  D. Baker,et al.  RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design , 2011, PloS one.

[6]  Gevorg Grigoryan,et al.  Probing designability via a generalized model of helical bundle geometry. , 2011, Journal of molecular biology.

[7]  Joost Schymkowitz,et al.  Protein-peptide complex prediction through fragment interaction patterns. , 2013, Structure.

[8]  Timothy A. Whitehead,et al.  Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin , 2011, Science.

[9]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[10]  S. Ramakumar,et al.  π‐Turns in proteins and peptides: Classification, conformation, occurrence, hydration and sequence , 1996, Protein science : a publication of the Protein Society.

[11]  Erinna F. Lee,et al.  A structural viral mimic of prosurvival Bcl-2: a pivotal role for sequestering proapoptotic Bax and Bak. , 2007, Molecular cell.

[12]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[13]  David Baker,et al.  Proof of principle for epitope-focused vaccine design , 2014, Nature.

[14]  P. Koehl,et al.  Helix‐sheet packing in proteins , 2010, Proteins.

[15]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[16]  Johannes Söding,et al.  Protein sequence comparison and fold recognition: progress and good-practice benchmarking. , 2011, Current opinion in structural biology.

[17]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[18]  D. Baker,et al.  Principles for designing ideal protein structures , 2012, Nature.

[19]  Liisa Holm,et al.  Advances and pitfalls of protein structural alignment. , 2009, Current opinion in structural biology.

[20]  Lydia E. Kavraki,et al.  The LabelHash algorithm for substructure matching , 2010, BMC Bioinformatics.

[21]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[22]  Alexej Abyzov,et al.  Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point , 2004, Protein science : a publication of the Protein Society.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  Ralf Zimmer,et al.  Protein structure alignment considering phenotypic plasticity , 2008, ECCB.

[25]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[26]  Nikolay V. Dokholyan,et al.  Rigid substructure search , 2011, Bioinform..

[27]  Kam Y. J. Zhang,et al.  A Probabilistic Fragment-Based Protein Structure Prediction Algorithm , 2012, PloS one.

[28]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[29]  V. Muñoz,et al.  Elucidating the folding problem of helical peptides using empirical parameters. II. Helix macrodipole effects and rational modification of the helical content of natural peptides. , 1995, Journal of molecular biology.

[30]  András Fiser,et al.  Structural Characteristics of Novel Protein Folds , 2010, PLoS Comput. Biol..

[31]  Shuai Cheng Li,et al.  A tool for clustering large numbers of protein decoys , 2010 .

[32]  B. Steipe,et al.  A revised proof of the metric properties of optimally superimposed vector sets. , 2002, Acta crystallographica. Section A, Foundations of crystallography.

[33]  A. Strasser,et al.  The BCL-2 protein family: opposing activities that mediate cell death , 2008, Nature Reviews Molecular Cell Biology.

[34]  Joost Schymkowitz,et al.  Protein design with fragment databases. , 2011, Current opinion in structural biology.

[35]  Ismail Hakki Toroslu,et al.  Integrated search and alignment of protein structures , 2008, Bioinform..

[36]  D. Gfeller,et al.  A structural portrait of the PDZ domain family. , 2014, Journal of molecular biology.

[37]  D. Baker,et al.  A “loop entropy reduction” phage‐display selection for folded amino acid sequences , 2001, Protein science : a publication of the Protein Society.

[38]  William F. DeGrado,et al.  A Real-Time All-Atom Structural Search Engine for Proteins , 2014, PLoS Comput. Biol..

[39]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[40]  James M Aramini,et al.  Assessment of template‐based protein structure predictions in CASP10 , 2014, Proteins.

[41]  D. Baker,et al.  Computation-Guided Backbone Grafting of a Discontinuous Motif onto a Protein Scaffold , 2011, Science.

[42]  Chris Bailey-Kellogg,et al.  Ballast: A Ball-Based Algorithm for Structural Motifs , 2012, RECOMB.

[43]  Tetsuo Shibuya,et al.  Searching Protein 3-D Structures in Linear Time , 2009, RECOMB.

[44]  W. DeGrado,et al.  Helix-packing motifs in membrane proteins , 2006, Proceedings of the National Academy of Sciences.

[45]  W. DeGrado,et al.  Computational Design of Virus-Like Protein Assemblies on Carbon Nanotube Surfaces , 2011, Science.

[46]  M. Demirel,et al.  How do insertions affect green fluorescent protein , 2006 .

[47]  A Yellow Fluorescent Protein with Reduced Chloride Sensitivity Engineered by Loop‐Insertion , 2013, Chembiochem : a European journal of chemical biology.