Bayesian Protein Structure Prediction

An important role for statisticians in the age of the Human Genome Project has developed in the emerging area of “structural bioinformatics”. Sequence analysis and structure prediction for biopolymers is a crucial step on the path to turning newly sequenced genomic data into biologically and pharmaceutically relevant information in support of molecular medicine. We describe our work on Bayesian models for prediction of protein structure from sequence, based on analysis of a database of experimentally determined protein structures. We have previously developed segment-based models of protein secondary structure which capture fundamental aspects of the protein folding process. These models provide predictive performance at the level of the best available methods in the field (Schmidler et al., 2000). Here we show that this Bayesian framework is naturally generalized to incorporate information based on non-local sequence interactions. We demonstrate this idea by presenting a simple model for β-strand pairing and a Markov chain Monte Carlo (MCMC) algorithm for inference. We apply the approach to prediction of 3-dimensional contacts for two example proteins.

[1]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[2]  Collin M. Stultz,et al.  Structural analysis based on state‐space modeling , 1993, Protein science : a publication of the Protein Society.

[3]  B Honig,et al.  An algorithm to generate low-resolution protein tertiary structures from knowledge of secondary structure. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[5]  R. Aurora,et al.  Helix capping , 1998, Protein science : a publication of the Protein Society.

[6]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[7]  T. Hubbard,et al.  Fold recognition and ab initio structure predictions using hidden markov models and β‐strand pair potentials , 1995, Proteins.

[8]  G. Montelione,et al.  A banner year for membranes , 1999, Nature Structural Biology.

[9]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[10]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[11]  V A Eyrich,et al.  Prediction of protein tertiary structure to low resolution: performance for a large and structurally diverse test set. , 1999, Journal of molecular biology.

[12]  J. Thornton,et al.  Determinants of strand register in antiparallel β‐sheets of proteins , 1998, Protein science : a publication of the Protein Society.

[13]  Scott R. Presnell,et al.  Origins of structural diversity within sequentially identical hexapeptides , 1993, Protein science : a publication of the Protein Society.

[14]  Anders Krogh,et al.  Prediction of Beta Sheets in Proteins , 1995, NIPS.

[15]  G J Barton,et al.  Protein secondary structure prediction. , 1995, Current opinion in structural biology.

[16]  G. Rose,et al.  Is protein folding hierarchic? I. Local structure and peptide folding. , 1999, Trends in biochemical sciences.

[17]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[18]  Arnold Neumaier,et al.  Molecular Modeling of Proteins and Mathematical Prediction of Protein Structure , 1997, SIAM Rev..

[19]  Douglas L. Brutlag,et al.  Statistical models and monte carlo methods for protein structure prediction , 2002 .

[20]  K. Dill Polymer principles and protein folding , 1999, Protein science : a publication of the Protein Society.

[21]  Jun S. Liu Peskun's theorem and a modified discrete-state Gibbs sampler , 1996 .

[22]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[23]  D. Brutlag,et al.  Discovering structural correlations in α‐helices , 1994 .

[24]  C. Sander,et al.  Specific recognition in the tertiary structure of β-sheets of proteins , 1980 .

[25]  P. S. Kim,et al.  Context-dependent secondary structure formation of a designed protein sequence , 1996, Nature.

[26]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[27]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[28]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[29]  F. Collins,et al.  New goals for the U.S. Human Genome Project: 1998-2003. , 1998, Science.

[30]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .