A Hidden Markov Model applied to the protein 3D structure analysis

Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. A Hidden Markov Model has been set up to optimally compress 3D conformation of proteins into a structural alphabet (SA), corresponding to a library of limited and representative SA-letters. Each SA-letter corresponds to a set of short local fragments of four C"@a similar both in terms of geometry and in the way in which these fragments are concatenated in order to make a protein. The discretization of protein backbone local conformation as series of SA-letters results on a simplification of protein 3D coordinates into a unique 1D representation. Some evidence is presented that such approach can constitute a very relevant way to analyze protein architecture in particular for protein structure comparison or prediction.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3]  Pierre Tufféry,et al.  SA-Search: a web tool for protein structure mining based on a Structural Alphabet , 2004, Nucleic Acids Res..

[4]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[5]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[6]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[7]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[8]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[9]  V. Renugopalakrishnan,et al.  Specialized Biology From Tandem β-Turns , 2002 .

[10]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[11]  Baldomero Oliva,et al.  Classification of common functional loops of kinase super‐families , 2004, Proteins.

[12]  A Maritan,et al.  Recurrent oligomers in proteins: An optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies , 2000, Proteins.

[13]  A C Camproux,et al.  A hidden markov model derived structural alphabet for proteins. , 2004, Journal of molecular biology.

[14]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[15]  R N Re,et al.  On the sequencing of the human genome. , 2000, Hypertension.

[16]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[17]  ScienceDirect Computational statistics & data analysis , 1983 .

[18]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  J. Fetrow Omega loops; nonregular secondary structures significant in protein function and stability , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[21]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  J. Thornton,et al.  A revised set of potentials for β‐turn formation in proteins , 1994 .

[24]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[25]  Manju Bansal,et al.  Geometrical and Sequence Characteristics of α-Helices in Globular Proteins , 1998 .

[26]  Yann Guédon,et al.  Exploring the state sequence space for hidden Markov and semi-Markov chains , 2007, Comput. Stat. Data Anal..

[27]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[28]  Ruth Nussinov,et al.  fragment folding and assembly Reducing the computational complexity of protein folding via , 2002 .

[29]  A C Camproux,et al.  Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. , 2005, Biochimica et biophysica acta.

[30]  J. Thornton,et al.  A revised set of potentials for beta-turn formation in proteins. , 1994, Protein science : a publication of the Protein Society.

[31]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[32]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[33]  V. Renugopalakrishnan,et al.  Specialized biology from tandem beta-turns. , 2002, Archives of medical research.

[34]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[35]  Thomas Madej,et al.  Structural similarity of loops in protein families: toward the understanding of protein evolution , 2005, BMC Evolutionary Biology.

[36]  An-Suei Yang,et al.  Local structure-based sequence profile database for local and global protein structure predictions , 2002, Bioinform..

[37]  An-Suei Yang,et al.  Protein backbone angle prediction with machine learning approaches , 2004, Bioinform..

[38]  S. Kumar,et al.  Geometrical and sequence characteristics of alpha-helices in globular proteins. , 1998, Biophysical journal.