Statistical Estimation of Statistical Mechanical Models: Helix-Coil Theory and Peptide Helicity Prediction

Analysis of biopolymer sequences and structures generally adopts one of two approaches: use of detailed biophysical theoretical models of the system with experimentally-determined parameters, or largely empirical statistical models obtained by extracting parameters from large datasets. In this work, we demonstrate a merger of these two approaches using Bayesian statistics. We adopt a common biophysical model for local protein folding and peptide configuration, the helix-coil model. The parameters of this model are estimated by statistical fitting to a large dataset, using prior distributions based on experimental data. L(1)-norm shrinkage priors are applied to induce sparsity among the estimated parameters, resulting in a significantly simplified model. Formal statistical procedures for evaluating support in the data for previously proposed model extensions are presented. We demonstrate the advantages of this approach including improved prediction accuracy and quantification of prediction uncertainty, and discuss opportunities for statistical design of experiments. Our approach yields a 39% improvement in mean-squared predictive error over the current best algorithm for this problem. In the process we also provide an efficient recursive algorithm for exact calculation of ensemble helicity including sidechain interactions, and derive an explicit relation between homo- and heteropolymer helix-coil theories and Markov chains and (non-standard) hidden Markov models respectively, which has not appeared in the literature previously.

[1]  S. Lifson,et al.  On the Theory of Helix—Coil Transition in Polypeptides , 1961 .

[2]  R. L. Baldwin,et al.  N‐ and C‐capping preferences for all 20 amino acids in α‐helical peptides , 1995, Protein science : a publication of the Protein Society.

[3]  Interaction between water and polar groups of the helix backbone: an important determinant of helix propensities. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  R. L. Baldwin,et al.  The energetics of ion-pair and hydrogen-bonding interactions in a helical peptide. , 1993, Biochemistry.

[5]  H. Qian,et al.  Prediction of α-Helices in Proteins Based on Thermodynamic Parameters from Solution Chemistry , 1996 .

[6]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[7]  K A Dill,et al.  Additivity Principles in Biochemistry* , 1997, The Journal of Biological Chemistry.

[8]  H. Qian,et al.  Helix-coil theories: a comparative study for finite length polypeptides , 1992 .

[9]  George D. Rose,et al.  Interactions between hydrophobic side chains within α‐helices , 1995 .

[10]  L Serrano,et al.  Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. , 1998, Journal of molecular biology.

[11]  R. L. Baldwin,et al.  Parameters of helix–coil transition theory for alanine‐based peptides of varying chain lengths in water , 1991, Biopolymers.

[12]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[13]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[14]  G. McGaughey,et al.  pi-Stacking interactions. Alive and well in proteins. , 1998, The Journal of biological chemistry.

[15]  Prediction of the secondary structure of proteins using the helix-coil transition theory. , 1974 .

[16]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[17]  H. Scheraga,et al.  Helix-coil transitions re-visited. , 2002, Biophysical chemistry.

[18]  B. Charloteaux,et al.  Aromatic side‐chain interactions in proteins. I. Main structural features , 2002, Proteins.

[19]  K A Dill,et al.  A method for parameter optimization in computational biology. , 2000, Biophysical journal.

[20]  L Serrano,et al.  Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: comparison with Zimm-Bragg and Lifson-Roig formalisms. , 1997, Biopolymers.

[21]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[22]  A. Doig,et al.  Addition of side chain interactions to modified Lifson‐Roig helix‐coil theory: Application to energetics of Phenylalanine‐Methionine interactions , 1995, Protein science : a publication of the Protein Society.

[23]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[24]  H. Scheraga,et al.  Theory of helix-coil transitions in biopolymers : statistical mechanical theory of order-disorder transitions in biological macromolecules , 1970 .

[25]  A. Liwo,et al.  Protein structure prediction by global optimization of a potential energy function. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  K. Sanbonmatsu,et al.  α-Helical stabilization by side chain shielding of backbone hydrogen bonds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[28]  A. Doig,et al.  Addition of side‐chain interactions to 310‐helix/coil and α‐helix/310‐helix/coil theory , 1998, Protein science : a publication of the Protein Society.

[29]  L Serrano,et al.  Elucidating the folding problem of helical peptides using empirical parameters. III. Temperature and pH dependence. , 1995, Journal of molecular biology.

[30]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[31]  S. Doniach,et al.  A computer model to dynamically simulate protein folding: Studies with crambin , 1989, Proteins.

[32]  R. L. Baldwin,et al.  Determination of free energies of N-capping in alpha-helices by modification of the Lifson-Roig helix-coil therapy to include N- and C-capping. , 1994, Biochemistry.

[33]  Robert L. Baldwin,et al.  Relative helix-forming tendencies of nonpolar amino acids , 1990, Nature.

[34]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[35]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[36]  R. L. Baldwin,et al.  The mechanism of alpha-helix formation by peptides. , 1992, Annual review of biophysics and biomolecular structure.

[37]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[38]  Juan Fernández-Recio,et al.  The Tryptophan/Histidine interaction in α-helices , 1997 .

[39]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[40]  Annick Thomas,et al.  Aromatic side‐chain interactions in proteins. II. Near‐ and far‐sequence Phe‐X pairs , 2002, Proteins.

[41]  G. Rose,et al.  Helix signals in proteins. , 1988, Science.

[42]  V. Muñoz,et al.  Elucidating the folding problem of helical peptides using empirical parameters. II. Helix macrodipole effects and rational modification of the helical content of natural peptides. , 1995, Journal of molecular biology.

[43]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[44]  N. Andersen,et al.  Empirical parameterization of a model for predicting peptide helix/coil equilibrium populations , 1997, Protein science : a publication of the Protein Society.

[45]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[46]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[47]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[48]  E. Stellwagen,et al.  Incorporation of pairwise interactions into the Lifson‐Roig model for helix prediction , 1995, Protein science : a publication of the Protein Society.

[49]  B. Zimm,et al.  Theory of the Phase Transition between Helix and Random Coil in Polypeptide Chains , 1959 .

[50]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Chung F. Wong,et al.  Predicting helical segments in proteins by a helix‐coil transition theory with parameters derived from a structural database of proteins , 1997, Proteins.

[52]  R. L. Baldwin,et al.  Tests for helix‐stabilizing interactions between various nonpolar side chains in alanine‐based peptides , 1994, Protein science : a publication of the Protein Society.

[53]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[54]  Luis Serrano,et al.  Elucidating the folding problem of helical peptides using empirical parameters , 1994, Nature Structural Biology.

[55]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[56]  Wen-Hua Ju,et al.  On Bayesian Learning of Sparse Classifiers , 2002 .

[57]  W Shalongo,et al.  Dichroic statistical model for prediction and analysis of peptide helicity , 1997, Proteins.

[58]  D. Brutlag,et al.  Discovering structural correlations in α‐helices , 1994 .

[59]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[60]  R. L. Baldwin,et al.  Measuring the strength of side-chain hydrogen bonds in peptide helices: the Gln.Asp (i, i + 4) interaction. , 1995, Biochemistry.

[61]  W F van Gunsteren,et al.  Decomposition of the free energy of a system in terms of specific interactions. Implications for theoretical and experimental studies. , 1994, Journal of molecular biology.

[62]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[63]  R. L. Baldwin,et al.  Helix propensities of the amino acids measured in alanine‐based peptides without helix‐stabilizing side‐chain interactions , 1994, Protein science : a publication of the Protein Society.

[64]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[65]  T. Creighton Proteins: Structures and Molecular Properties , 1986 .

[66]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[67]  G. Makhatadze,et al.  Temperature dependence of the thermodynamics of helix-coil transition. , 2004, Journal of molecular biology.

[68]  R. Aurora,et al.  Helix capping , 1998, Protein science : a publication of the Protein Society.