Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.

[1]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[2]  R. Gnanadesikan,et al.  Probability plotting methods for the analysis of data. , 1968, Biometrika.

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[5]  Ian Abramson On Bandwidth Variation in Kernel Estimates-A Square Root Law , 1982 .

[6]  I. Abramson Arbitrariness of the pilot estimator in adaptive kernel methods , 1982 .

[7]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[8]  J. Thornton,et al.  Beta-turns and their distortions: a proposed new nomenclature. , 1990, Protein engineering.

[9]  J. Gibrat,et al.  Influence of the local amino acid sequence upon the zones of the torsional angles phi and psi adopted by residues in proteins. , 1991, Biochemistry.

[10]  J. Thornton,et al.  Influence of proline residues on protein conformation. , 1991, Journal of molecular biology.

[11]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[12]  M. Swindells,et al.  Intrinsic phi, psi propensities of amino acids, derived from the coil regions of known structures. , 1995, Nature structural biology.

[13]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[14]  M. Swindells,et al.  Intrinsic φ,ψ propensities of amino acids, derived from the coil regions of known structures , 1995, Nature Structural Biology.

[15]  P. Karplus Experimentally observed conformation‐dependent geometry and hidden strain in proteins , 1996, Protein science : a publication of the Protein Society.

[16]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[17]  Chris Sander,et al.  Objectively judging the quality of a protein structure from a Ramachandran plot , 1997, Comput. Appl. Biosci..

[18]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[19]  R. Srinivasan,et al.  The Flory isolated-pair hypothesis is not valid for polypeptide chains: implications for protein folding. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[21]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[22]  S. Hovmöller,et al.  Conformations of amino acids in proteins. , 2002, Acta crystallographica. Section D, Biological crystallography.

[23]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[24]  R. Berry,et al.  Investigations into sequence and conformational dependence of backbone entropy, inter-basin dynamics and the Flory isolated-pair hypothesis for peptides. , 2003, Journal of molecular biology.

[25]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[26]  Bosco K. Ho,et al.  Revisiting the Ramachandran plot: Hard‐sphere repulsion, electrostatics, and H‐bonding in the α‐helix , 2003, Protein science : a publication of the Protein Society.

[27]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[28]  T. A. Jones,et al.  The Uppsala Electron-Density Server. , 2004, Acta crystallographica. Section D, Biological crystallography.

[29]  Deniz Yuret,et al.  Relationships between amino acid sequence and backbone torsion angle preferences , 2004, Proteins.

[30]  David C. Richardson,et al.  MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and their complexes , 2004, Nucleic Acids Res..

[31]  J. Skolnick,et al.  Local propensities and statistical potentials of backbone dihedral angles in proteins. , 2004, Journal of molecular biology.

[32]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[33]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[34]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[35]  Abhishek K. Jha,et al.  Helix, sheet, and polyproline II frequencies and strong nearest neighbor effects in a restricted coil library. , 2005, Biochemistry.

[36]  A. Pertsemlidis,et al.  Bayesian Statistical Studies of the Ramachandran Distribution , 2005, Statistical applications in genetics and molecular biology.

[37]  Z. Weng,et al.  Main‐chain conformational tendencies of amino acids , 2005, Proteins.

[38]  Zhengshuang Shi,et al.  Neighbor effect on PPII conformation in alanine peptides. , 2005, Journal of the American Chemical Society.

[39]  Bosco K. Ho,et al.  The Ramachandran plots of glycine and pre-proline , 2005, BMC Structural Biology.

[40]  Alexander D. MacKerell,et al.  Importance of the CMAP correction to the CHARMM22 protein force field: dynamics of hen lysozyme. , 2006, Biophysical journal.

[41]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[42]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[43]  B. Erman,et al.  Computational basis of knowledge‐based conformational probabilities derived from local‐ and long‐range interactions in proteins , 2006, Proteins.

[44]  David Baker,et al.  Protein-protein docking with backbone flexibility. , 2007, Journal of molecular biology.

[45]  Roland L. Dunbrack,et al.  Statistical and conformational analysis of the electron density of protein side chains , 2006, Proteins.

[46]  M. Vannucci,et al.  Assessing side-chain perturbations of the protein backbone: a knowledge-based classification of residue Ramachandran space. , 2008, Journal of molecular biology.

[47]  Barry Honig,et al.  Loop modeling: Sampling, filtering, and scoring , 2007, Proteins.

[48]  Nir Kalisman,et al.  Differentiable, multi‐dimensional, knowledge‐based energy terms for torsion angle probabilities and propensities , 2008, Proteins.

[49]  Roland L. Dunbrack,et al.  Conformation dependence of backbone geometry in proteins. , 2009, Structure.

[50]  K. P. Lennox,et al.  Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics , 2009, Journal of the American Statistical Association.

[51]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[52]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .