Deriving High-Resolution Protein Backbone Structure Propensities from All Crystal Data Using the Information Maximization Device

The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.

[1]  G J Kleywegt,et al.  Phi/psi-chology: Ramachandran revisited. , 1996, Structure.

[2]  Armando D Solis,et al.  Improvement of statistical potentials and threading score functions using information maximization , 2006, Proteins.

[3]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[4]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[5]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[6]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[7]  Armando D Solis,et al.  Information and discrimination in pairwise contact potentials , 2008, Proteins.

[8]  G J Kleywegt,et al.  Validation of protein crystal structures. , 2000, Acta crystallographica. Section D, Biological crystallography.

[9]  M. Vannucci,et al.  Assessing side-chain perturbations of the protein backbone: a knowledge-based classification of residue Ramachandran space. , 2008, Journal of molecular biology.

[10]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[11]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[12]  J. Richardson,et al.  “THE PLOT” THICKENS: MORE DATA, MORE DIMENSIONS, MORE USES , 2013 .

[13]  M. Bansal,et al.  Biomolecular Forms and Functions:A Celebration of 50 Years of the Ramachandran Map , 2012 .

[14]  Yaohang Li,et al.  Backbone statistical potential from local sequence-structure interactions in protein loops. , 2010, The journal of physical chemistry. B.

[15]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[16]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[17]  S. Rackovsky,et al.  Optimally informative backbone structural propensities in proteins , 2002, Proteins.

[18]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[19]  Marcos R Betancourt Knowledge-based potential for the polypeptide backbone. , 2008, The journal of physical chemistry. B.

[20]  F. Jiang,et al.  Influence of side chain conformations on local conformational features of amino acids and implication for force field development. , 2010, The journal of physical chemistry. B.

[21]  Michael I. Jordan,et al.  Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model , 2010, PLoS Comput. Biol..

[22]  D. Shortle Composites of local structure propensities: evidence for local encoding of long-range structure. , 2002, Protein science : a publication of the Protein Society.

[23]  I. Bertini,et al.  A use of Ramachandran potentials in protein solution structure determinations , 2003, Journal of biomolecular NMR.

[24]  Abhishek K. Jha,et al.  Automated real-space refinement of protein structures using a realistic backbone move set. , 2011, Biophysical journal.

[25]  P. Karplus,et al.  A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins , 2010, Biomolecular concepts.

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Valerie Daggett,et al.  The intrinsic conformational propensities of the 20 naturally occurring amino acids and reflection of these propensities in proteins , 2008, Proceedings of the National Academy of Sciences.

[28]  Marvin Edelman,et al.  The limit of accuracy of protein modeling: influence of crystal packing on protein structure. , 2005, Journal of molecular biology.

[29]  F E Cohen,et al.  Conformational attractors on the Ramachandran map. , 1999, Acta crystallographica. Section D, Biological crystallography.

[30]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[31]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[32]  Hervé Philippe,et al.  Statistical potentials for improved structurally constrained evolutionary models. , 2010, Molecular biology and evolution.