The Inverse Protein Folding Problem: Protein Design and Structure Prediction in the Genomic Era

Millions of proteins are being identified every year by high throughput genome sequencing projects. Many others can potentially be created by protein engineering and design methods. Here, we review a method for computational protein design (CPD), which starts from a known protein and its 3D structure, and seeks to modify it by mutating some or all of the amino acid sidechains. The mutations are selected to provide stability, and possibly other properties, such as ligand binding. For each set of candidate mutations, the 3D structure is modeled, with an assumption of small, localized perturbations; in particular, we assume the backbone conformation does not change significantly. As in other CPD implementations, the structure is modeled using a classical, molecular mechanics approach along with a simple, implicit description of solvent. Some of the calculations have been distributed to volunteers on the Internet, through our Proteins@Home volunteer computing project. The method and selected results are described, which show that the designed sequences share important properties of natural proteins.

[1]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[2]  Alexander D. MacKerell,et al.  Atomistic Models and Force Fields , 2001 .

[3]  Brian Kuhlman,et al.  Computer-based design of novel protein structures. , 2006, Annual review of biophysics and biomolecular structure.

[4]  Thomas Simonson,et al.  Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design , 2008, BMC Bioinformatics.

[5]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[6]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[7]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[10]  Christopher M. MacDermaid,et al.  Theoretical and computational protein design. , 2011, Annual review of physical chemistry.

[11]  Axel T. Brunger,et al.  X-PLOR Version 3.1: A System for X-ray Crystallography and NMR , 1992 .

[12]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[13]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[14]  T. Simonson,et al.  Proton binding to proteins: a free-energy component analysis using a dielectric continuum model. , 2005, Biophysical journal.

[15]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[16]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[17]  Bruce Tidor,et al.  Progress in computational protein design. , 2007, Current opinion in biotechnology.

[18]  B. Roux,et al.  Implicit solvent models. , 1999, Biophysical chemistry.

[19]  David Eisenberg,et al.  A problem for the theory of biological structure , 1982, Nature.

[20]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[21]  A. Fersht Structure and mechanism in protein science , 1998 .

[22]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[23]  L. Looger,et al.  Computational design of receptor and sensor proteins with novel functions , 2003, Nature.

[24]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[25]  T. Hahn International tables for crystallography , 2002 .

[26]  Thomas Simonson,et al.  Computational protein design: Software implementation, parameter optimization, and performance of a simple model , 2008, J. Comput. Chem..

[27]  Thomas Simonson,et al.  Computational Protein Design: Validation and Possible Relevance as a Tool for Homology Searching and Fold Recognition , 2010, PloS one.

[28]  C. Branden,et al.  Introduction to protein structure , 1991 .

[29]  Christopher T. Saunders,et al.  Recapitulation of protein family divergence using flexible backbone protein design. , 2005, Journal of molecular biology.

[30]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[31]  Chen Zeng,et al.  An improved pairwise decomposable finite‐difference Poisson–Boltzmann method for computational protein design , 2008, J. Comput. Chem..

[32]  R. Lavery,et al.  A new approach to the rapid determination of protein side chain conformations. , 1991, Journal of biomolecular structure & dynamics.

[33]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[34]  M. Karplus,et al.  Dynamics of folded proteins , 1977, Nature.

[35]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[36]  J. Pleiss Protein design in metabolic engineering and synthetic biology. , 2011, Current opinion in biotechnology.

[37]  David Mignon,et al.  Computational protein design as a tool for fold recognition , 2009, Proteins.

[38]  Thomas Lengauer,et al.  Bioinformatics ‐ From Genomes to Drugs , 2001 .

[39]  Vijay S. Pande,et al.  Screen Savers of the World Unite! , 2000, Science.

[40]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[41]  S J Wodak,et al.  Automatic protein design with all atom force-fields by exact and heuristic optimization. , 2000, Journal of molecular biology.

[42]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[43]  Lorenz Wernisch,et al.  Folding free energy function selects native-like protein sequences in the core but not on the surface , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[45]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[46]  Frances M. G. Pearl,et al.  The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[47]  Thomas Simonson,et al.  Computational sidechain placement and protein mutagenesis with implicit solvent models , 2007, Proteins.

[48]  S. A. Marshall,et al.  Designing proteins for therapeutic applications. , 2003, Current opinion in structural biology.

[49]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[50]  M. Karplus,et al.  Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics , 1988 .

[51]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..

[52]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[53]  Bert L de Groot,et al.  Protein thermostability calculations using alchemical free energy simulations. , 2010, Biophysical journal.

[54]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[55]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.