Exploring the Universe of Protein Structures beyond the Protein Data Bank

It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds.

[1]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[2]  A V Finkelstein,et al.  The classification and origins of protein folding patterns. , 1990, Annual review of biochemistry.

[3]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[4]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[5]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[6]  Jayanth R Banavar,et al.  Physics of proteins. , 2007, Annual review of biophysics and biomolecular structure.

[7]  M. Levitt,et al.  Funnel sculpting for in silico assembly of secondary structure elements of proteins , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Fabio Polticelli,et al.  Investigation of de novo Totally Random Biosequences, Part II , 2006, Chemistry & biodiversity.

[9]  Wei Zhang,et al.  A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations , 2003, J. Comput. Chem..

[10]  F. Stillinger,et al.  Poly(L-alanine) as a universal reference material for understanding protein energies and structures. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  W. Lim,et al.  Deciphering the message in protein sequences: tolerance to amino acid substitutions. , 1990, Science.

[12]  A. Laio,et al.  Optimizing the performance of bias-exchange metadynamics: folding a 48-residue LysM domain using a coarse-grained model. , 2010, The journal of physical chemistry. B.

[13]  A. Laio,et al.  Substrate binding mechanism of HIV-1 protease from explicit-solvent atomistic simulations. , 2009, Journal of the American Chemical Society.

[14]  K. Dill,et al.  The protein folding problem. , 1993, Annual review of biophysics.

[15]  Pasquale Stano,et al.  From Never Born Proteins to Minimal Living Cells: Two Projects in Synthetic Biology , 2007, Origins of Life and Evolution of Biospheres.

[16]  Vijay S Pande,et al.  Convergence of folding free energy landscapes via application of enhanced sampling methods in a distributed computing environment. , 2008, The Journal of chemical physics.

[17]  G. Rose,et al.  Hierarchic organization of domains in globular proteins. , 1979, Journal of molecular biology.

[18]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[19]  C. Dobson,et al.  Protein misfolding, functional amyloid, and human disease. , 2006, Annual review of biochemistry.

[20]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[21]  A. Finkelstein,et al.  A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[23]  A. Laio,et al.  Predicting the effect of a point mutation on a protein fold: the villin and advillin headpieces and their Pro62Ala mutants. , 2008, Journal of molecular biology.

[24]  Michal Brylinski,et al.  The continuity of protein structure space is an intrinsic property of proteins , 2009, Proceedings of the National Academy of Sciences.

[25]  Alessandro Laio,et al.  A Collective Variable for the Efficient Exploration of Protein Beta-Sheet Structures: Application to SH3 and GB1. , 2009, Journal of chemical theory and computation.

[26]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[27]  A. Laio,et al.  A bias-exchange approach to protein folding. , 2007, The journal of physical chemistry. B.

[28]  Flavio Seno,et al.  Geometry and symmetry presculpt the free-energy landscape of proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[30]  U. Hansmann Parallel tempering algorithm for conformational studies of biological molecules , 1997, physics/9710041.

[31]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[32]  B. Matthews,et al.  Structural and genetic analysis of protein stability. , 1993, Annual review of biochemistry.

[33]  P. Wolynes,et al.  Symmetry and the energy landscapes of biomolecules. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[34]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  William R Taylor,et al.  Probing the "dark matter" of protein fold space. , 2009, Structure.

[36]  V. Muñoz,et al.  A simple model for calculating the kinetics of protein folding from three-dimensional structures. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[38]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[39]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[40]  Flavio Seno,et al.  Common attributes of native-state structures of proteins, disordered proteins, and amyloid. , 2006, Proceedings of the National Academy of Sciences of the United States of America.