A generative, probabilistic model of local protein structure

Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence–structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.

[1]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[6]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[7]  J. Thornton,et al.  A revised set of potentials for β‐turn formation in proteins , 1994 .

[8]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[11]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[12]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[13]  R. Aurora,et al.  Helix capping , 1998, Protein science : a publication of the Protein Society.

[14]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[15]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[16]  D. Dowe,et al.  An MML classification of protein structure that knows about angles and sequence. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[18]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[19]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[20]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[21]  Simon Cawley,et al.  HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.

[22]  A. Krogh,et al.  Teaching computers to fold proteins. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  W. Gilks Markov Chain Monte Carlo , 2005 .

[24]  Lode Wyns,et al.  SABmark- a benchmark for sequence alignment that covers the entire known fold space , 2005, Bioinform..

[25]  S. Takada,et al.  Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Anders Krogh,et al.  Sampling Realistic Protein Conformations Using Local Structural Bias , 2006, PLoS Comput. Biol..

[27]  K. Mardia,et al.  Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data , 2007, Biometrics.

[28]  S Banu Ozkan,et al.  The protein folding problem: when will it be solved? , 2007, Current opinion in structural biology.

[29]  Prasanna R Kolatkar,et al.  Assessment of CASP7 structure predictions for template free targets , 2007, Proteins.