An Information Measure of the Quality of Protein Secondary Structure Prediction

We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.

[1]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[2]  B. Ripley,et al.  E. T. Jaynes: Papers on Probability, Statistics and Statistical Physics , 1983 .

[3]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[4]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[5]  R. M. Swanson Entropy measures amount of choice , 1990 .

[6]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[7]  Marina Vannucci,et al.  Information theory provides a comprehensive framework for the evaluation of protein structure predictions , 2009, Proteins.

[8]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[9]  Alfonso Valencia,et al.  CAFASP3 in the spotlight of EVA , 2003, Proteins.

[10]  Volker A. Eyrich,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[11]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[12]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[13]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[14]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[15]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[16]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[17]  E. Jaynes,et al.  Confidence Intervals vs Bayesian Intervals , 1976 .

[18]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.