MODELING AND INFERENCE OF SEQUENCE-STRUCTURE SPECIFICITY

In order to evaluate protein sequences for simultaneous satisfaction of evolutionary and physical constraints, this paper develops a graphical model approach integrating sequence information from the evolutionary record of a protein family with structural information based on a molecular mechanics force field. Nodes in the graphical model represent choices for the backbone (native vs. non-native), amino acids (conservation analysis), and side-chain conformations (rotamer library). Edges capture dependence relationships, in both the sequence (correlated mutations) and the structure (direct physical interactions). The sequence and structure components of the model are complementary, in that the structure component may support choices that were not present in the sequence record due to bias and artifacts, while the sequence component may capture other constraints on protein viability, such as permitting an efficient folding pathway. Inferential procedures enable computation of the joint probability of a sequence-structure pair, thereby assessing the quality of the sequence with respect to both the protein family and the specificity of its energetic preference for the native structure against alternate backbone structures. In a case study of WW domains, we show that by using the joint model and evaluating specificity, we obtain better prediction of foldedness of designed proteins (AUC of 0.85) than either a sequence-only or a structure-only model, and gain insights into how, where, and why the sequence and structure components complement each other.

[1]  Christopher A. Voigt,et al.  Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach. , 2004, Chemistry & biology.

[2]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[3]  C. Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2008, TCBB.

[4]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[5]  M. Thorpe,et al.  Constrained geometric simulation of diffusive motion in proteins , 2005, Physical biology.

[6]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[8]  P. Harbury,et al.  Automated design of specificity in molecular recognition , 2003, Nature Structural Biology.

[9]  H. Bethe Statistical Theory of Superlattices , 1935 .

[10]  Colin A. Smith,et al.  A simple model of backbone flexibility improves modeling of side-chain conformational variability. , 2008, Journal of molecular biology.

[11]  D. Baker,et al.  The Highly Cooperative Folding of Small Naturally Occurring Proteins Is Likely the Result of Natural Selection , 2007, Cell.

[12]  Gregory Stephanopoulos,et al.  A linguistic model for the rational design of antimicrobial peptides , 2006, Nature.

[13]  Jinming Zou,et al.  Statistical theory for protein ensembles with designed energy landscapes. , 2005, The Journal of chemical physics.

[14]  Bruce Randall Donald,et al.  Algorithm for backrub motions in protein design , 2008, ISMB.

[15]  L. H. Bradley,et al.  Protein design by binary patterning of polar and nonpolar amino acids. , 1993, Methods in molecular biology.

[16]  Eric P. Xing,et al.  Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation , 2007, RECOMB.

[17]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[18]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[19]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[20]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[21]  Menachem Fromer,et al.  Accurate prediction for atomic‐level protein design and its application in diversifying the near‐optimal sequence space , 2009, Proteins.

[22]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[23]  Colin A. Smith,et al.  Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. , 2008, Journal of molecular biology.

[24]  Bruce Randall Donald,et al.  A novel ensemble-based scoring and search algorithm for protein redesign, and its application to modify the substrate specificity of the gramicidin synthetase A phenylalanine adenylation enzyme , 2004, RECOMB.

[25]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Chris Bailey-Kellogg,et al.  A graphical model approach for predicting free energies of association for protein-protein interactions under backbone and side-chain flexibility , 2008 .

[28]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[29]  Chris Bailey-Kellogg,et al.  Protein Design by Sampling an Undirected Graphical Model of Residue Constraints , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[31]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[32]  Ian W. Davis,et al.  The backrub motion: how protein backbone shrugs when a sidechain dances. , 2006, Structure.