A pentapeptide-based method for protein secondary structure prediction.

We present a new method for protein secondary structure prediction, based on the recognition of well-defined pentapeptides, in a large databank. Using a databank of 635 protein chains, we obtained a success rate of 68.6%. We show that progress is achieved when the databank is enlarged, when the 20 amino acids are adequately grouped in 10 sets and when more pentapeptides are attributed one of the defined conformations, alpha-helices or beta-strands. The analysis of the model indicates that the essential variable is the number of pentapeptides of well-defined structure in the database. Our model is simple, does not rely on arbitrary parameters and allows the analysis in detail of the results of each chosen hypothesis.

[1]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[2]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Finkelstein,et al.  Theory of protein secondary structure and algorithm of its prediction , 1983, Biopolymers.

[4]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[5]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[6]  P. Curmi,et al.  The dependence of amino acid pair correlations on structural environment , 1998, Proteins.

[7]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[8]  B Rost,et al.  Progress of 1D protein structure prediction at last , 1995, Proteins.

[9]  Joël Pothier,et al.  P-SEA: a new efficient assignment of secondary structure from C alpha trace of proteins , 1997, Comput. Appl. Biosci..

[10]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[11]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.

[12]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[13]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[14]  S M King,et al.  Assigning secondary structure from protein coordinate data , 1999, Proteins.

[15]  J M Chandonia,et al.  New methods for accurate prediction of protein secondary structure , 1999, Proteins.

[16]  P. Argos,et al.  Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. , 1995, Critical reviews in biochemistry and molecular biology.

[17]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[18]  P. Argos,et al.  Quantification of secondary structure prediction improvement using multiple alignments. , 1993, Protein engineering.

[19]  John P. Overington,et al.  The prediction and orientation of alpha-helices from sequence alignments: the combined use of environment-dependent substitution tables, Fourier transform methods and helix capping rules. , 1994, Protein engineering.

[20]  S. Sudarsanam,et al.  Structural diversity of sequentially identical subsequences of proteins: Identical octapeptides can have different conformations , 1998, Proteins.

[21]  John P. Overington,et al.  A structural basis for sequence comparisons. An evaluation of scoring methodologies. , 1993, Journal of molecular biology.

[22]  S. I. Rogov,et al.  A numerical measure of amino acid residues similarity based on the analysis of their surroundings in natural protein sequences. , 2001, Protein engineering.

[23]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[24]  Shoshana J. Wodak,et al.  Identification of predictive sequence motifs limited by protein structure data base size , 1988, Nature.

[25]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[26]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[27]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[28]  J. Tohá,et al.  Secondary structure of proteins and three-dimensional pattern recognition. , 1999, Journal of theoretical biology.