Coarse-graining protein energetics in sequence variables.

We show that cluster expansions (CE), previously used to model solid-state materials with binary or ternary configurational disorder, can be extended to the protein design problem. We present a generalized CE framework, in which properties such as energy can be unambiguously expanded in the amino-acid sequence space. The CE coarse grains over nonsequence degrees of freedom (e.g., side-chain conformations) and thereby simplifies the problem of designing proteins, or predicting the compatibility of a sequence with a given structure, by many orders of magnitude. The CE is physically transparent, and can be evaluated through linear regression on the energies of training sequences. We show, as example, that good prediction accuracy is obtained with up to pairwise interactions for a coiled-coil backbone, and that triplet interactions are important in the energetics of a more globular zinc-finger backbone.

[1]  P. S. Kim,et al.  Mechanism of specificity in the Fos-Jun oncoprotein heterodimer , 1992, Cell.

[2]  Michele Vendruscolo,et al.  Protein folding: bringing theory and experiment closer together. , 2003, Current opinion in structural biology.

[3]  Raphael Guerois,et al.  Energy estimation in protein design. , 2002, Current opinion in structural biology.

[4]  Johnson,et al.  Commensurate and incommensurate ordering tendencies in the ternary fcc Cu-Ni-Zn system. , 1995, Physical review letters.

[5]  Loren L Looger,et al.  Computational Design of a Biologically Active Enzyme , 2004, Science.

[6]  Ceder,et al.  Linear-programming method for obtaining effective cluster interactions in alloys from total-energy calculations: Application to the fcc Pd-V system. , 1995, Physical review. B, Condensed matter.

[7]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[8]  Ceder,et al.  Nonempirical phase equilibria in the W-Mo-Cr system. , 1995, Physical review. B, Condensed matter.

[9]  F. Crick,et al.  The packing of α‐helices: simple coiled‐coils , 1953 .

[10]  G. Ceder A derivation of the Ising model for the computation of phase diagrams , 1993 .

[11]  Gevorg Grigoryan,et al.  Design of a Heterospecific, Tetrameric, 21-Residue Miniprotein with Mixed α/β Structure , 2005 .

[12]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[13]  C. Vinson,et al.  A heterodimerizing leucine zipper coiled coil system for examining the specificity of a position interactions: amino acids I, V, L, N, A, and K. , 2002, Biochemistry.

[14]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[15]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[16]  V. Ozoliņš,et al.  Incorporating first-principles energetics in computational thermodynamics approaches , 2002 .

[17]  Jessica H. Fong,et al.  Predicting specificity in bZIP coiled-coil protein interactions , 2004, Genome Biology.

[18]  L Serrano,et al.  Protein design based on folding models. , 2001, Current opinion in structural biology.

[19]  G. Ceder,et al.  A Model to Predict Ionic Disorder and Phase Diagrams: Application to CaO-MgO, Gd2O3-ZrO2, and to Sodium β′′-alumina , 1997 .

[20]  A. Wollacott,et al.  Computational protein design. , 2001, Current opinion in chemical biology.

[21]  T M Handel,et al.  Review: protein design--where we were, where we are, where we're going. , 2001, Journal of structural biology.

[22]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[23]  Loren L Looger,et al.  Computational design of receptors for an organophosphate surrogate of the nerve agent soman. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[25]  Alex Zunger,et al.  Structural complexity in binary bcc ground states: The case of bcc Mo-Ta , 2004 .

[26]  C. Vinson,et al.  A thermodynamic scale for leucine zipper stability and dimerization specificity: e and g interhelical interactions. , 1994, The EMBO journal.

[27]  Rama Ranganathan,et al.  Knowledge-based potential functions in protein design. , 2002, Current opinion in structural biology.

[28]  F. Ducastelle,et al.  Generalized cluster description of multicomponent systems , 1984 .

[29]  A. van de Walle,et al.  Automating First-Principles Phase Diagram Calculations , 2002 .