Fragment‐HMM: A new approach to protein structure prediction

We designed a simple position‐specific hidden Markov model to predict protein structure. Our new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this theory converges to within 6 Å of the native structures for 100% of decoys on all six standard benchmark proteins used in ROSETTA (discussed by Simons and colleagues in a recent paper), which achieved only 14%–94% for the same data. The qualities of the best decoys and the final decoys our theory converges to are also notably better.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[3]  Babatunde A. Ogunnaike,et al.  A geometric invariant-based framework for the analysis of protein conformational space , 2005, Bioinform..

[4]  Anders Krogh,et al.  Sampling Realistic Protein Conformations Using Local Structural Bias , 2006, PLoS Comput. Biol..

[5]  Sung-Hou Kim,et al.  A method for evaluating the structural quality of protein models by using higher-order φ–ψ pairs scoring , 2006 .

[6]  Tao Jiang,et al.  The Regularized EM Algorithm , 2005, AAAI.

[7]  M J Rooman,et al.  Automatic definition of recurrent local structure motifs in proteins. , 1990, Journal of molecular biology.

[8]  R. Zwanzig,et al.  Levinthal's paradox. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  C. Bystroff,et al.  Three‐dimensional structures and contexts associated with recurrent amino acid sequence patterns , 1997, Protein science : a publication of the Protein Society.

[10]  Gregory E Sims,et al.  A method for evaluating the structural quality of protein models by using higher-order phi-psi pairs scoring. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[12]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[15]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[16]  D Baker,et al.  Local sequence-structure correlations in proteins. , 1996, Current opinion in biotechnology.

[17]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[18]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[19]  Shuai Cheng Li,et al.  Designing succinct structural alphabets , 2008, ISMB.

[20]  Ming Li,et al.  Assessment of RAPTOR's linear programming approach in CAFASP3 , 2003, Proteins.

[21]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[22]  D. Shortle Composites of local structure propensities: evidence for local encoding of long-range structure. , 2002, Protein science : a publication of the Protein Society.

[23]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[24]  G. Rose,et al.  Building native protein conformation from highly approximate backbone torsion angles. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[26]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[27]  L. Pauling,et al.  The pleated sheet, a new layer configuration of polypeptide chains. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Harshinder Singh,et al.  Probabilistic model for two dependent circular variables , 2002 .

[29]  Andrej Sali,et al.  Minimalist representations and the importance of nearest neighbor effects in protein folding simulations. , 2006, Journal of molecular biology.

[30]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[31]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[32]  Jinbo Xu,et al.  Discriminative learning for protein conformation sampling , 2008, Proteins.

[33]  K. Mardia,et al.  Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data , 2007, Biometrics.

[34]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[35]  C. Levinthal Are there pathways for protein folding , 1968 .

[36]  Jerry Tsai,et al.  Some fundamental aspects of building protein structures from fragment libraries , 2004, Protein science : a publication of the Protein Society.

[37]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..