Improving protein secondary structure predictions by prediction fusion

Protein secondary structure prediction is still a challenging problem at today. Even if a number of prediction methods have been presented in the literature, the various prediction tools that are available on-line produce results whose quality is not always fully satisfactory. Therefore, a user has to know which predictor to use for a given protein to be analyzed. In this paper, we propose a server implementing a method to improve the accuracy in protein secondary structure prediction. The method is based on integrating the prediction results computed by some available on-line prediction tools to obtain a combined prediction of higher quality. Given an input protein p whose secondary structure has to be predicted, and a group of proteins F, whose secondary structures are known, the server currently works according to a two phase approach: (i) it selects a set of predictors good at predicting the secondary structure of proteins in F (and, therefore, supposedly, that of p as well), and (ii) it integrates the prediction results delivered for p by the selected team of prediction tools. Therefore, by exploiting our system, the user is relieved of the burden of selecting the most appropriate predictor for the given input protein being, at the same time, assumed that a prediction result at least as good as the best available one will be delivered. The correctness of the resulting prediction is measured referring to EVA accuracy parameters used in several editions of CASP.

[1]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[2]  Luigi Palopoli,et al.  PROTEIN SECONDARY STRUCTURE PREDICTION: HOW TO IMPROVE ACCURACY BY INTEGRATION , 2006 .

[3]  Anders Krogh,et al.  SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM , 1995 .

[4]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[5]  Anna Tramontano Protein Structure Prediction: Concepts and Applications , 2006 .

[6]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[7]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[8]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[9]  C. Branden,et al.  Introduction to protein structure , 1991 .

[10]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[11]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[12]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[13]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[14]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[15]  Collin M. Stultz,et al.  Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. , 1994, Mathematical Biosciences.

[16]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[17]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[18]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[19]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[20]  Giorgio Valle,et al.  Simple consensus procedures are effective and sufficient in secondary structure prediction. , 2003, Protein engineering.

[21]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[22]  Peter R. Shewry Protein Structure Prediction: Methods and Protocols. Methods in Molecular Biology Volume 143. David M. Webster (ed.). 2000. , 2004, Plant Growth Regulation.

[23]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[24]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[25]  David M. Webster,et al.  Protein structure prediction : methods and protocols , 2000 .

[26]  C Venclovas,et al.  Some measures of comparative performance in the three CASPs , 1999, Proteins.

[27]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[28]  Collin M. Stultz,et al.  Structural analysis based on state‐space modeling , 1993, Protein science : a publication of the Protein Society.

[29]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[30]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[31]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[32]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[33]  J. Ponder,et al.  Protein structure prediction using a combination of sequence homology and global energy minimization: II. Energy functions , 1998 .

[34]  Yann Guermeur,et al.  Combinaison de classifieurs statistiques : application à la prédiction de la structure secondaire des protéines , 1997 .

[35]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[36]  Agostino Dovier,et al.  Using Secondary Structure Information for Protein Folding in CLP(FD) , 2002, Electron. Notes Theor. Comput. Sci..

[37]  Sorin Istrail,et al.  Mathematical Methods for Protein Structure Analysis and Design , 2003, Lecture Notes in Computer Science.

[38]  Luigi Palopoli,et al.  Coopps: a System for the Cooperative Prediction of Protein Structures , 2004, J. Bioinform. Comput. Biol..

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[40]  Liam J. McGuffin,et al.  Protein structure prediction servers at University College London , 2005, Nucleic Acids Res..

[41]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[42]  A. Elofsson,et al.  Best α‐helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information , 2004 .

[43]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[44]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Christopher Bystroff,et al.  Fully automated ab initio protein structure prediction using I-STES, HMMSTR and ROSETTA , 2002, ISMB.

[46]  Anna Tramontano,et al.  The ten most wanted solutions in protein bioinformatics , 2005 .

[47]  Luigi Palopoli,et al.  JSSPrediction: a Framework to Predict Protein Secondary Structures Using Integration , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).