Hybrid Protein Model (HPM): a method to compact protein 3D-structure information and physicochemical properties

The transformation of protein 1D-sequence to protein 3D-structure is one of the main difficulties of structural biology. A structural alphabet has been previously defined from dihedral angles describing the protein backbone as structural information by using an unsupervised classifier. The 16 protein blocks (PBs), basis element of the structural alphabet, allows a correct 3D structure approximation. Local prediction had been estimated by a Bayesian approach and shown that sequence information induces strongly the local fold, but stays coarse (prediction rate of 40.7% with one PB, 75,8% with the four most probable PBs). The Hybrid Protein Model presented in this study learns both the sequence and structure of the proteins. The analysis made along the hybrid protein has permitted to appreciate more precisely the spatial location of some types of amino acid residues in the secondary structures and their flanking regions. This study leads to a fuzzy model of dependence between sequence and structure.

[1]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[2]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[3]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[4]  M. Palumbo,et al.  Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme , 1997, Proteins.

[5]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[6]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[7]  A. Zamyatnin,et al.  Protein volume in solution. , 1972, Progress in biophysics and molecular biology.

[8]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[9]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[10]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[11]  D. A. Bell,et al.  Applied Statistics , 1953, Nature.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[14]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[15]  C. Levinthal Molecular model-building by computer. , 1966, Scientific American.

[16]  J. Wójcik,et al.  New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. , 1999, Journal of molecular biology.

[17]  J. Kwasigroch,et al.  A global taxonomy of loops in globular proteins. , 1996, Journal of molecular biology.

[18]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[19]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[20]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[21]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[22]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[23]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[24]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[25]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[26]  T. Salakoski,et al.  Selection of a representative set of structures from brookhaven protein data bank , 1992, Proteins.

[27]  J Schuchhardt,et al.  Local structural motifs of protein backbones are classified by self-organizing neural networks. , 1996, Protein engineering.

[28]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[29]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[30]  M J Rooman,et al.  Automatic definition of recurrent local structure motifs in proteins. , 1990, Journal of molecular biology.