Predictions of protein segments with the same aminoacid sequence and different secondary structure: A benchmark for predictive methods

The most stringent test for predictive methods of protein secondary structure is whether identical short sequences that are known to be present with different conformations in different proteins known at atomic resolution can be correctly discriminated. In this study, we show that the prediction efficiency of this type of segments in unrelated proteins reaches an average accuracy per residue ranging from about 72 to 75% (depending on the alignment method used to generate the input sequence profile) only when methods of the third generation are used. A comparison of different methods based on segment statistics (2nd generation methods) and/or including also evolutionary information (3rd generation methods) indicate that the discrimination of the different conformations of identical segments is dependent on the method used for the prediction. Accuracy is similar when methods similarly performing on the secondary structure prediction are tested. When evolutionary information is taken into account as compared to single sequence input, the number of correctly discriminated pairs is increased twofold. The results also highlight the predictive capability of neural networks for identical segments whose conformation differs in different proteins. Proteins 2000;41:535–544. © 2000 Wiley‐Liss, Inc.

[1]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[4]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[6]  P Argos,et al.  Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. , 1987, Journal of molecular biology.

[7]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[8]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[11]  M. Sippl Calculation of conformational ensembles from potentials of mena force , 1990 .

[12]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[13]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[14]  Scott R. Presnell,et al.  Origins of structural diversity within sequentially identical hexapeptides , 1993, Protein science : a publication of the Protein Society.

[15]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[16]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[17]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[18]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[19]  M. Gruebele,et al.  Direct observation of fast protein folding: the initial collapse of apomyoglobin. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D Baker,et al.  Global properties of the mapping between local amino acid sequence and local structure in proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[22]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[23]  A. Fersht,et al.  Initiation sites of protein folding by NMR analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P. S. Kim,et al.  Context-dependent secondary structure formation of a designed protein sequence , 1996, Nature.

[25]  R. Casadio,et al.  Noise and randomlike behavior of perceptrons: Theory and applicationto protein structure prediction , 1997 .

[26]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[27]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[28]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[29]  P. Munson,et al.  Statistical significance of hierarchical multi‐body potentials based on Delaunay tessellation and their application in sequence‐structure alignment , 1997, Protein science : a publication of the Protein Society.

[30]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments and family profiles , 1998, Nucleic Acids Res..

[31]  P G Wolynes,et al.  Protein folding mechanisms and the multidimensional folding funnel , 1998, Proteins.

[32]  G J Barton,et al.  Protein sequence alignment techniques. , 1998, Acta crystallographica. Section D, Biological crystallography.

[33]  S. Sudarsanam,et al.  Structural diversity of sequentially identical subsequences of proteins: Identical octapeptides can have different conformations , 1998, Proteins.

[34]  B. Rost,et al.  Marrying structure and genomics. , 1998, Structure.

[35]  M Mezei,et al.  Chameleon sequences in the PDB. , 1998, Protein engineering.

[36]  K. Dill Polymer principles and protein folding , 1999, Protein science : a publication of the Protein Society.

[37]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[38]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[39]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[40]  J M Chandonia,et al.  New methods for accurate prediction of protein secondary structure , 1999, Proteins.

[41]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[42]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..