Conformational ensembles and sampled energy landscapes: Analysis and comparison

We present novel algorithms and software addressing four core problems in computational structural biology, namely analyzing a conformational ensemble, comparing two conformational ensembles, analyzing a sampled energy landscape, and comparing two sampled energy landscapes. Using recent developments in computational topology, graph theory, and combinatorial optimization, we make two notable contributions. First, we present a generic algorithm analyzing height fields. We then use this algorithm to perform density‐based clustering of conformations, and to analyze a sampled energy landscape in terms of basins and transitions between them. In both cases, topological persistence is used to manage (geometric) frustration. Second, we introduce two algorithms to compare transition graphs. The first is the classical earth mover distance metric which depends only on local minimum energy configurations along with their statistical weights, while the second incorporates topological constraints inherent to conformational transitions. Illustrations are provided on a simplified protein model (BLN69), whose frustrated potential energy landscape has been thoroughly studied. The software implementing our tools is also made available, and should prove valuable wherever conformational ensembles and energy landscapes are used. © 2015 Wiley Periodicals, Inc.

[1]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[2]  Gerhard Hummer,et al.  Multi-basin dynamics of a protein in a crystal environment , 1997 .

[3]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  David J Wales,et al.  Potential energy and free energy landscapes. , 2006, The journal of physical chemistry. B.

[6]  ChengYizong Mean Shift, Mode Seeking, and Clustering , 1995 .

[7]  Sibani,et al.  Diffusion in hierarchies. , 1988, Physical review. A, General physics.

[8]  W. Ebeling Stochastic Processes in Physics and Chemistry , 1995 .

[9]  Hosam M. Mahmoud,et al.  Evolution of random search trees , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[10]  J. C. Schön,et al.  Studying the energy hypersurface of continuous systems - the threshold algorithm , 1996 .

[11]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[12]  David J Wales,et al.  Archetypal energy landscapes: dynamical diagnosis. , 2005, The Journal of chemical physics.

[13]  Philip Ball,et al.  The hidden structure of liquids. , 2014, Nature materials.

[14]  Karl Heinz Hoffmann,et al.  Coarse graining of a spin-glass state space , 1998 .

[16]  Gábor Csányi,et al.  Efficient sampling of atomic configurational spaces. , 2009, The journal of physical chemistry. B.

[17]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[18]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[19]  Leonidas J. Guibas,et al.  Persistence-Based Clustering in Riemannian Manifolds , 2013, JACM.

[20]  F. J. Sevilla,et al.  Low-dimensional BEC , 2000 .

[21]  W. G. Hoover molecular dynamics , 1986, Catalysis from A to Z.

[22]  Mark A. Miller,et al.  Archetypal energy landscapes , 1998, Nature.

[23]  J. Doye,et al.  Characterizing the network topology of the energy landscapes of atomic clusters. , 2004, The Journal of chemical physics.

[24]  Roy L. Johnston,et al.  The Effect of Nonnative Interactions on the Energy Landscapes of Frustrated Model Proteins , 2012 .

[25]  David J Wales,et al.  Folding pathways and rates for the three-stranded beta-sheet peptide Beta3s using discrete path sampling. , 2008, The journal of physical chemistry. B.

[26]  J. C. Schön,et al.  Controlled dynamics on energy landscapes , 2013 .

[27]  P. Wolynes,et al.  Spin glasses and the statistical mechanics of protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Berg,et al.  Multicanonical ensemble: A new approach to simulate first-order phase transitions. , 1992, Physical review letters.

[29]  Pierre Alliez,et al.  Computational geometry algorithms library , 2008, SIGGRAPH '08.

[30]  Ioannis G Kevrekidis,et al.  Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions. , 2014, The Journal of chemical physics.

[31]  D. Wales Discrete path sampling , 2002 .

[32]  Fred A. Hamprecht,et al.  A strategy for analysis of (molecular) equilibrium simulations: Configuration space density estimation, clustering, and visualization , 2001 .

[33]  P Argos,et al.  Optimal protocol and trajectory visualization for conformational searches of peptides and proteins. , 1992, Journal of molecular biology.

[34]  Cecilia Clementi,et al.  Polymer reversal rate calculated via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[35]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[36]  Lydia E. Kavraki,et al.  A dimensionality reduction approach to modeling protein flexibility , 2002, RECOMB '02.

[37]  H. Scheraga,et al.  Monte Carlo-minimization approach to the multiple-minima problem in protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[38]  A. Müller Journal of Physics Condensed Matter , 2008 .

[39]  M. Karplus,et al.  The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics , 1997 .

[40]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[41]  G. Henkelman,et al.  Methods for Finding Saddle Points and Minimum Energy Paths , 2002 .

[42]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[43]  A. Banyaga,et al.  Lectures on Morse Homology , 2005 .

[44]  Jaroslav Nesetril,et al.  Otakar Boruvka on minimum spanning tree problem Translation of both the 1926 papers, comments, history , 2001, Discret. Math..

[45]  C. Villani Topics in Optimal Transportation , 2003 .

[46]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[47]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[49]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[50]  F E Cohen,et al.  Protein conformational landscapes: Energy minimization and clustering of a long molecular dynamics trajectory , 1995, Proteins.

[51]  Pierre Alliez,et al.  CGAL - The Computational Geometry Algorithms Library , 2011 .

[52]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[53]  Paolo Sibani,et al.  The lid method for exhaustive exploration of metastable states of complex systems , 1999 .

[54]  C. Dellago,et al.  Transition Path Sampling and Other Advanced Simulation Techniques for Rare Events , 2009 .

[55]  Andreas Heuer,et al.  Properties of a glass-forming system as derived from its potential energy landscape , 1997 .

[56]  Thomas A. Weber,et al.  Hidden structure in liquids , 1982 .

[57]  Scott Brown,et al.  Coarse-grained sequences for protein folding and design , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Oliver Beckstein,et al.  MDAnalysis: A toolkit for the analysis of molecular dynamics simulations , 2011, J. Comput. Chem..

[59]  Y. Sugita,et al.  Replica-exchange molecular dynamics method for protein folding , 1999 .

[60]  Berry,et al.  Topography and Dynamics of Multidimensional Interatomic Potential Surfaces. , 1995, Physical review letters.

[61]  G. Forbes Molecular Dynamics , 1885, Nature.

[62]  M. Karplus,et al.  Multiple conformational states of proteins: a molecular dynamics analysis of myoglobin. , 1987, Science.

[63]  Peter Salamon,et al.  Emergent Hierarchical Structures in Complex-System Dynamics. , 1993 .

[64]  A. Lyubartsev,et al.  New approach to Monte Carlo calculation of the free energy: Method of expanded ensembles , 1992 .

[65]  Y. Okamoto,et al.  Molecular dynamics, Langevin, and hybrid Monte Carlo simulations in multicanonical ensemble , 1996, physics/9710018.

[66]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[67]  D. Thirumalai,et al.  Metastability of the folded states of globular proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[69]  Jinqiao Duan,et al.  Lévy noise-induced stochastic resonance in a bistable system , 2012, 1207.3939.

[70]  Journal of Chemical Physics , 1932, Nature.

[71]  Tamiki Komatsuzaki,et al.  How many dimensions are required to approximate the potential energy landscape of a model protein? , 2005, The Journal of chemical physics.

[72]  Martin Zacharias,et al.  Simulated annealing coupled replica exchange molecular dynamics--an efficient conformational sampling method. , 2009, Journal of structural biology.

[73]  Sergei V Krivov Hierarchical global optimization of quasiseparable systems: application to Lennard-Jones clusters. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[74]  J. Onuchic,et al.  Theory of protein folding: the energy landscape perspective. , 1997, Annual review of physical chemistry.

[75]  I. Hassan Embedded , 2005, The Cyber Security Handbook.

[76]  L. Devroye,et al.  A weighted k-nearest neighbor density estimate for geometric inference , 2011 .

[77]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[78]  Naoko Nakagawa,et al.  The inherent structure landscape of a protein. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Juan Cortés,et al.  Randomized tree construction algorithm to explore energy landscapes , 2011, J. Comput. Chem..

[80]  Bruce A. Draper,et al.  Are you using the right approximate nearest neighbor algorithm? , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[81]  Masaki Hilaga,et al.  Topological Modeling for Visualization , 1997 .

[82]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..

[83]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[84]  F. Cazals,et al.  Mass Transportation Problems with Connectivity Constraints, with Applications to Energy Landscape Comparison , 2014 .

[85]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[86]  J. C. Schön,et al.  Global aspects of the energy landscape of metastable crystal structures in ionic compounds , 1999 .

[87]  David Cohen-Steiner,et al.  Reconstructing 3D compact sets , 2012, Comput. Geom..

[88]  Yuko Okamoto,et al.  Multidimensional generalized-ensemble algorithms for complex systems. , 2009, The Journal of chemical physics.

[89]  Peter G Wolynes,et al.  Funneling and frustration in the energy landscapes of some designed and simplified proteins. , 2013, The Journal of chemical physics.

[90]  R. Berry Energy landscapes: topographies, interparticle forces and dynamics, and how they are related , 2010 .

[91]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[92]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[93]  Roy L. Johnston,et al.  Energy Landscape and Global Optimization for a Frustrated Model Protein , 2011, The journal of physical chemistry. B.