Inferring repeat-protein energetics from evolutionary information

Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.

[1]  Yong Xiong,et al.  Design of stable alpha-helical arrays from an idealized TPR motif. , 2003, Structure.

[2]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[3]  Sophie E Jackson,et al.  A recurring theme in protein engineering: the design, stability and folding of repeat proteins. , 2005, Current opinion in structural biology.

[4]  R. Levy,et al.  Structural propensities of kinase family proteins from a Potts model of residue co‐variation , 2016, Protein science : a publication of the Protein Society.

[5]  Elizabeth A Komives,et al.  Folding kinetics of the cooperatively folded subdomain of the IκBα ankyrin repeat domain. , 2011, Journal of molecular biology.

[6]  Terence Hwa,et al.  Coevolutionary signals across protein lineages help capture multiple protein conformations , 2013, Proceedings of the National Academy of Sciences.

[7]  G Tiana,et al.  A many-body term improves the accuracy of effective potentials based on protein coevolutionary data. , 2015, The Journal of chemical physics.

[8]  Mohit Raghunathan,et al.  Constructing sequence‐dependent protein models using coevolutionary information , 2016, Protein science : a publication of the Protein Society.

[9]  Manjunatha Kogenaru,et al.  Origin of a folded repeat protein from an intrinsically disordered ancestor , 2016, eLife.

[10]  H. Frauenfelder,et al.  Function and Dynamics of Myoglobin a , 1987, Annals of the New York Academy of Sciences.

[11]  Albert Perez-Riba,et al.  Dissecting and reprogramming the folding and assembly of tandem-repeat proteins. , 2015, Biochemical Society transactions.

[12]  Aitziber L Cortajarena,et al.  Crystal structure of a designed tetratricopeptide repeat module in complex with its peptide ligand , 2010, The FEBS journal.

[13]  Peter G. Wolynes,et al.  Stabilizing IκBα by “Consensus” Design , 2007 .

[14]  The UniProt Consortium UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Res..

[15]  Thierry Mora,et al.  Capturing coevolutionary signals inrepeat proteins , 2014, BMC Bioinformatics.

[16]  Dominique Durand,et al.  Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats. , 2010, Journal of molecular biology.

[17]  Andreas Plückthun,et al.  A designed ankyrin repeat protein evolved to picomolar affinity to Her2. , 2007, Journal of molecular biology.

[18]  Peter G Wolynes,et al.  Frustration in biomolecules , 2013, Quarterly Reviews of Biophysics.

[19]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[20]  Thierry Mora,et al.  Repeat proteins challenge the concept of structural domains. , 2015, Biochemical Society transactions.

[21]  Manfred J. Sippl,et al.  Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space , 2013, The journal of physical chemistry. B.

[22]  H Frauenfelder,et al.  Myoglobin: The hydrogen atom of biology and a paradigm of complexity , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Andreas Plückthun,et al.  Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. , 2003, Journal of molecular biology.

[24]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[25]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  José N. Onuchic,et al.  Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information , 2014, Proceedings of the National Academy of Sciences.

[27]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[28]  Erich Bornberg-Bauer,et al.  Evolution of Protein Domain Repeats in Metazoa , 2016, Molecular biology and evolution.

[29]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[30]  A. Finkelstein,et al.  Why do protein architectures have boltzmann‐like statistics? , 1995, Proteins.

[31]  B. Kobe,et al.  The leucine-rich repeat as a protein recognition motif. , 2001, Current opinion in structural biology.

[32]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[33]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[34]  Peter G Wolynes,et al.  Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection , 2014, Proceedings of the National Academy of Sciences.

[35]  Diego U Ferreiro,et al.  Detailing Protein Landscapes under Pressure. , 2016, Biophysical journal.

[36]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[37]  Andreas Plückthun,et al.  Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family. , 2003, Journal of molecular biology.

[38]  Doug Barrick,et al.  Highly polarized C-terminal transition state of the leucine-rich repeat domain of PP32 is governed by local stability , 2015, Proceedings of the National Academy of Sciences.

[39]  G. Blatch,et al.  The tetratricopeptide repeat: a structural motif mediating protein-protein interactions. , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[40]  Z. Peng,et al.  Consensus-derived structural determinants of the ankyrin repeat motif , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[41]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[42]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[43]  A. Valencia,et al.  From residue coevolution to protein conformational ensembles and functional dynamics , 2015, Proceedings of the National Academy of Sciences.

[44]  S. Smerdon,et al.  The ankyrin repeat: a diversity of interactions on a common structural framework. , 1999, Trends in biochemical sciences.

[45]  D. Barford,et al.  Molecular recognition via coupled folding and binding in a TPR domain. , 2005, Journal of molecular biology.

[46]  David Baker,et al.  Exploring the repeat protein universe through computational protein design , 2015, Nature.

[47]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[48]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[49]  Tommi Kajander,et al.  Structure and stability of designed TPR protein superhelices : unusual crystal packing and implications for natural TPR proteins , 2007 .

[50]  Anne Marie Krachler,et al.  Self‐association of TPR domains: Lessons learned from a designed, consensus‐based TPR oligomer , 2010, Proteins.

[51]  Verónica Becher,et al.  Protein Repeats from First Principles , 2015, Scientific Reports.

[52]  Martin Weigt,et al.  Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1 , 2015 .

[53]  Doug Barrick,et al.  An improved experimental system for determining small folding entropy changes resulting from proline to alanine substitutions , 2005, Protein science : a publication of the Protein Society.

[54]  Aleksandra M. Walczak,et al.  The Energy Landscapes of Repeat-Containing Proteins: Topology, Cooperativity, and the Folding Funnels of One-Dimensional Architectures , 2008, PLoS Comput. Biol..

[55]  Laura S Itzhaki,et al.  Sequential unfolding of ankyrin repeats in tumor suppressor p16. , 2003, Structure.

[56]  R. Levy,et al.  Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. , 2017, Current opinion in structural biology.