A Data-Driven Evolutionary Algorithm for Mapping Multibasin Protein Energy Landscapes

Evidence is emerging that many proteins involved in proteinopathies are dynamic molecules switching between stable and semistable structures to modulate their function. A detailed understanding of the relationship between structure and function in such molecules demands a comprehensive characterization of their conformation space. Currently, only stochastic optimization methods are capable of exploring conformation spaces to obtain sample-based representations of associated energy surfaces. These methods have to address the fundamental but challenging issue of balancing computational resources between exploration (obtaining a broad view of the space) and exploitation (going deep in the energy surface). We propose a novel algorithm that strikes an effective balance by employing concepts from evolutionary computation. The algorithm leverages deposited crystal structures of wildtype and variant sequences of a protein to define a reduced, low-dimensional search space from where to rapidly draw samples. A multiscale technique maps samples to local minima of the all-atom energy surface of a protein under investigation. Several novel algorithmic strategies are employed to avoid premature convergence to particular minima and obtain a broad view of a possibly multibasin energy surface. Analysis of applications on different proteins demonstrates the broad utility of the algorithm to map multibasin energy landscapes and advance modeling of multibasin proteins. In particular, applications on wildtype and variant sequences of proteins involved in proteinopathies demonstrate that the algorithm makes an important first step toward understanding the impact of sequence mutations on misfunction by providing the energy landscape as the intermediate explanatory link between protein sequence and function.

[1]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[2]  J Günter Grossmann,et al.  Dimer destabilization in superoxide dismutase may result in disease-causing properties: structures of motor neuron disease mutants. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jack Dongarra,et al.  LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[4]  A. Fernández-Medarde,et al.  Ras in cancer and developmental diseases. , 2011, Genes & cancer.

[5]  Claudio Soto,et al.  Protein misfolding and neurodegeneration. , 2008, Archives of neurology.

[6]  Amarda Shehu,et al.  Evolutionary-inspired probabilistic search for enhancing sampling of local minima in the protein energy surface , 2012, Proteome Science.

[7]  J. Onuchic,et al.  Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations , 2006, Proceedings of the National Academy of Sciences.

[8]  Stewart A. Adcock,et al.  Molecular dynamics: survey of methods for simulating the activity of proteins. , 2006, Chemical reviews.

[9]  R. Nussinov,et al.  The role of dynamic conformational ensembles in biomolecular recognition. , 2009, Nature chemical biology.

[10]  Ole J Mengshoel,et al.  The Crowding Approach to Niching in Genetic Algorithms , 2008, Evolutionary Computation.

[11]  Joelle N Pelletier,et al.  Protein motions promote catalysis. , 2004, Chemistry & biology.

[12]  Samuel L. DeLuca,et al.  Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You , 2010, Biochemistry.

[13]  Michele Vendruscolo,et al.  Dynamic Visions of Enzymatic Reactions , 2006, Science.

[14]  D. Kern,et al.  The role of dynamics in allosteric regulation. , 2003, Current opinion in structural biology.

[15]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[16]  Elizabeth J. Denning,et al.  Zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of open<-->closed transitions. , 2009, Journal of molecular biology.

[17]  Brian S. Olson,et al.  Multi-Objective Optimization Techniques for Conformational Sampling in Template-Free Protein Structure Prediction , 2014 .

[18]  John A Tainer,et al.  ALS mutants of human superoxide dismutase form fibrous aggregates via framework destabilization. , 2003, Journal of molecular biology.

[19]  Claudio Soto,et al.  Unfolding the role of protein misfolding in neurodegenerative diseases , 2003, Nature Reviews Neuroscience.

[20]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[21]  L. Kavraki,et al.  Multiscale characterization of protein conformational ensembles , 2009, Proteins.

[22]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[23]  Mitsuhiko Ikura,et al.  Structural basis for simultaneous binding of two carboxy-terminal peptides of plant glutamate decarboxylase to calmodulin. , 2003, Journal of molecular biology.

[24]  R. Conwit,et al.  Preventing familial ALS: A clinical trial may be feasible but is an efficacy trial warranted? , 2006, Journal of the Neurological Sciences.

[25]  Michael A Hough,et al.  The structure of holo and metal-deficient wild-type human Cu, Zn superoxide dismutase and its relevance to familial amyotrophic lateral sclerosis. , 2003, Journal of molecular biology.

[26]  D. Kern,et al.  Dynamic personalities of proteins , 2007, Nature.

[27]  D. Borchelt,et al.  Variation in the biochemical/biophysical properties of mutant superoxide dismutase 1 enzymes and the rate of disease progression in familial amyotrophic lateral sclerosis kindreds. , 1999, Human molecular genetics.

[28]  Qiang Lu,et al.  Single molecule conformational dynamics of adenylate kinase: energy landscape, structural correlations, and transition state ensembles. , 2008, Journal of the American Chemical Society.

[29]  Yaohang Li,et al.  Pareto-Based Optimal Sampling Method and Its Applications in Protein Structural Conformation Sampling , 2013 .

[30]  Lydia E. Kavraki,et al.  Understanding Protein Flexibility through Dimensionality Reduction , 2003, J. Comput. Biol..

[31]  L. Kay,et al.  Intrinsic dynamics of an enzyme underlies catalysis , 2005, Nature.

[32]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[33]  J. Onuchic,et al.  Theory of protein folding: the energy landscape perspective. , 1997, Annual review of physical chemistry.

[34]  Dominik Gront,et al.  Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates , 2007, J. Comput. Chem..

[35]  Yaohang Li,et al.  Improving predicted protein loop structure ranking using a Pareto-optimality consensus method , 2010, BMC Structural Biology.

[36]  Amarda Shehu,et al.  Efficient basin hopping in the protein energy surface , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[37]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.