Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM)

One of the challenges in protein secondary structure prediction is to overcome the cross-validated 80% prediction accuracy barrier. Here, we propose a novel approach to surpass this barrier. Instead of using a single algorithm that relies on a limited data set for training, we combine two complementary methods having different strengths: Fragment Database Mining (FDM) and GOR V. FDM harnesses the availability of the known protein structures in the Protein Data Bank and provides highly accurate secondary structure predictions when sequentially similar structural fragments are identified. In contrast, the GOR V algorithm is based on information theory, Bayesian statistics, and PSI-BLAST multiple sequence alignments to predict the secondary structure of residues inside a sliding window along a protein chain. A combination of these two different methods benefits from the large number of structures in the PDB and significantly improves the secondary structure prediction accuracy, resulting in Q3 ranging from 67.5 to 93.2%, depending on the availability of highly similar fragments in the Protein Data Bank.

[1]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[2]  John Moult,et al.  Rigorous performance evaluation in protein structure modelling and implications for computational biology , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Taner Z Sen,et al.  combining GOR V and Fragment Database Mining A Consensus Data Mining secondary structure prediction by , 2006 .

[4]  L. Wray,et al.  Functional Analysis of the Carboxy-Terminal Region of Bacillus subtilis TnrA, a MerR Family Protein , 2006, Journal of bacteriology.

[5]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[8]  D. Kihara The effect of long‐range interactions on the secondary structure formation of proteins , 2005, Protein science : a publication of the Protein Society.

[9]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[10]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[11]  J. Garnier,et al.  The GOR Method for Predicting Secondary Structures in Proteins , 1989 .

[12]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[14]  Debashish Sahu,et al.  Bhageerath: an energy based web enabled computer software suite for limiting the search space of tertiary structures of small globular proteins , 2006, Nucleic acids research.

[15]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[16]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[17]  Thomas R Kleyman,et al.  Distinct Structural Elements in the First Membrane-spanning Segment of the Epithelial Sodium Channel* , 2006, Journal of Biological Chemistry.

[18]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[19]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[20]  Taner Z Sen,et al.  Prediction of protein secondary structure by mining structural fragment database. , 2005, Polymer.

[21]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[22]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..