Computing energy landscape maps and structural excursions of proteins

BackgroundStructural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics.MethodsWe present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions.ResultsEvaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship.ConclusionsApplications show that the proposed methodology is effective at locating basins in complex energy landscapes and computing basin-basin excursions of a protein with a practical computational budget. While the actual temporal scales spanned by a structural excursion cannot be directly obtained due to the foregoing of simulation of dynamics, hypotheses can be formulated regarding the impact of sequence mutations on protein function. These hypotheses are valuable in instigating further research in wet laboratories.

[1]  Amarda Shehu,et al.  A General, Adaptive, Roadmap-Based Algorithm for Protein Motion Computation , 2016, IEEE Transactions on NanoBioscience.

[2]  William L. Jorgensen,et al.  Monte Carlo vs Molecular Dynamics for Conformational Sampling , 1997 .

[3]  D. Bar-Sagi,et al.  The structural basis for the transition from Ras-GTP to Ras-GDP , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Amarda Shehu,et al.  From Optimization to Mapping: An Evolutionary Algorithm for Protein Energy Landscapes , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  W. Greenleaf,et al.  High-resolution, single-molecule measurements of biomolecular motion. , 2007, Annual review of biophysics and biomolecular structure.

[6]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[7]  Amarda Shehu,et al.  Exploring the Structure Space of Wildtype Ras Guided by Experimental Data , 2013, BCB.

[8]  Erion Plaku,et al.  Computing transition paths in multiple-basin proteins with a probabilistic roadmap algorithm guided by structure data , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[10]  M. Blackledge,et al.  Direct observation of hierarchical protein dynamics , 2015, Science.

[11]  Kenneth A. De Jong,et al.  Off-lattice protein structure prediction with homologous crossover , 2013, GECCO '13.

[12]  Rafael C. Bernardi,et al.  Molecular dynamics simulations of large macromolecular complexes. , 2015, Current opinion in structural biology.

[13]  Dominik Gront,et al.  Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates , 2007, J. Comput. Chem..

[14]  R. G. Hart,et al.  Structure of Myoglobin: A Three-Dimensional Fourier Synthesis at 2 Å. Resolution , 1960, Nature.

[15]  Timothy D Craggs,et al.  Alternating-laser excitation: single-molecule FRET and beyond. , 2014, Chemical Society reviews.

[16]  Bruno Robert,et al.  Conformational Switching in a Light-Harvesting Protein as Followed by Single-Molecule Spectroscopy , 2015, Biophysical journal.

[17]  Daniel B. Carr,et al.  Looking at large data sets using binned data plots , 1992 .

[18]  Samuel L. DeLuca,et al.  Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You , 2010, Biochemistry.

[19]  M. Karplus,et al.  The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics , 1997 .

[20]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[21]  E. Schrödinger,et al.  What is life? : the physical aspect of the living cell , 1946 .

[22]  Michele Vendruscolo,et al.  A Coupled Equilibrium Shift Mechanism in Calmodulin-Mediated Signal Transduction , 2008, Structure.

[23]  Ruth Nussinov,et al.  Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm , 2015, PLoS Comput. Biol..

[24]  Henry van den Bedem,et al.  Integrated description of protein dynamics from room-temperature X-ray crystallography and NMR , 2014, Proceedings of the National Academy of Sciences.

[25]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[26]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[27]  Robert A. Weinberg,et al.  Ras oncogenes: split personalities , 2008, Nature Reviews Molecular Cell Biology.

[28]  Daniel B. Carr,et al.  Hexagon Mosaic Maps for Display of Univariate and Bivariate Geographical Data , 1992 .

[29]  Kenneth A. De Jong,et al.  A Novel EA-based Memetic Approach for Efficiently Mapping Complex Fitness Landscapes , 2016, GECCO.

[30]  Jochen S. Hub,et al.  Detection of Functional Modes in Protein Dynamics , 2010 .

[31]  D. Kern,et al.  Dynamic personalities of proteins , 2007, Nature.

[32]  Ron Unger The Genetic Algorithm Approach to Protein Structure Prediction , 2004 .

[33]  Amarda Shehu,et al.  A multiscale hybrid evolutionary algorithm to obtain sample-based representations of multi-basin protein energy landscapes , 2014, BCB.

[34]  R. Nussinov,et al.  Folding funnels, binding funnels, and protein function , 1999, Protein science : a publication of the Protein Society.

[35]  Brian P. Dawkins Investigating the Geometry of a p-Dimensional Data Set , 1995 .

[36]  M. Karplus,et al.  Molecular dynamics and protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Amarda Shehu,et al.  A stochastic roadmap method to model protein structural transitions , 2015, Robotica.

[38]  Klaus Schulten,et al.  Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics , 2013, Nature.

[39]  O Jardetzky,et al.  Protein dynamics. , 1994, FEBS letters.

[40]  Guang Zhu,et al.  NMR of Proteins and Small Biomolecules , 2012 .

[41]  Suryani Lukman,et al.  The Distinct Conformational Dynamics of K-Ras and H-Ras A59G , 2010, PLoS Comput. Biol..

[42]  T. Siméon,et al.  Modeling protein conformational transitions by a combination of coarse-grained normal mode analysis and robotics-inspired methods , 2013, BMC Structural Biology.

[43]  Thierry Siméon,et al.  Sampling-Based Path Planning on Configuration-Space Costmaps , 2010, IEEE Transactions on Robotics.

[44]  David J Wales,et al.  Energy landscapes: some new horizons. , 2010, Current opinion in structural biology.

[45]  E. Schrödinger What Is Life , 1946 .

[46]  A. Fernández-Medarde,et al.  Ras in cancer and developmental diseases. , 2011, Genes & cancer.

[47]  G. Chirikjian,et al.  Iterative cluster‐NMA: A tool for generating conformational transitions in proteins , 2009, Proteins.

[48]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[49]  Bin W. Zhang,et al.  Efficient and verified simulation of a path ensemble for conformational change in a united-residue model of calmodulin , 2007, Proceedings of the National Academy of Sciences.

[50]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[51]  J. Onuchic,et al.  Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations , 2006, Proceedings of the National Academy of Sciences.

[52]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[53]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[54]  James Andrew McCammon,et al.  Ras Conformational Switching: Simulating Nucleotide-Dependent Conformational Transitions with Accelerated Molecular Dynamics , 2009, PLoS Comput. Biol..

[55]  Mitsuhiko Ikura,et al.  Structural basis for simultaneous binding of two carboxy-terminal peptides of plant glutamate decarboxylase to calmodulin. , 2003, Journal of molecular biology.

[56]  R. Conwit,et al.  Preventing familial ALS: A clinical trial may be feasible but is an efficacy trial warranted? , 2006, Journal of the Neurological Sciences.

[57]  T. Siméon,et al.  An NMA‐guided path planning approach for computing large‐amplitude conformational changes in proteins , 2007, Proteins.

[58]  Ruth Nussinov,et al.  Computational Methods for Exploration and Analysis of Macromolecular Structure and Dynamics , 2015, PLoS Comput. Biol..

[59]  Aaron Dubrow What Got Done in One Year at NSF's Stampede Supercomputer , 2015, Computing in Science & Engineering.

[60]  S H Kim,et al.  Crystal structures at 2.2 A resolution of the catalytic domains of normal ras protein and an oncogenic mutant complexed with GDP. , 1991, Journal of molecular biology.

[61]  Harrison J. Hocker,et al.  Novel Allosteric Sites on Ras for Lead Generation , 2011, PloS one.

[62]  P. Wolynes,et al.  The energy landscapes and motions of proteins. , 1991, Science.

[63]  Gregory S. Chirikjian,et al.  Iterative cluster-NMA ( icNMA ) : A tool for generating conformational transitions in proteins , 2017 .

[64]  G. Chirikjian,et al.  Elastic models of conformational transitions in macromolecules. , 2002, Journal of molecular graphics & modelling.

[65]  S H Kim,et al.  Molecular switch for signal transduction: structural differences between active and inactive forms of protooncogenic ras proteins. , 1992, Science.

[66]  Nikolay V Dokholyan,et al.  Modifications of Superoxide Dismutase (SOD1) in Human Erythrocytes , 2009, Journal of Biological Chemistry.

[67]  R. Nussinov,et al.  Folding funnels and binding mechanisms. , 1999, Protein engineering.

[68]  Kenneth A. De Jong,et al.  Evolutionary search strategies for efficient sample-based representations of multiple-basin protein energy landscapes , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[69]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[70]  R. Nussinov,et al.  The role of dynamic conformational ensembles in biomolecular recognition. , 2009, Nature chemical biology.

[71]  Amarda Shehu,et al.  A Data-Driven Evolutionary Algorithm for Mapping Multibasin Protein Energy Landscapes , 2015, J. Comput. Biol..