Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information.

Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/ .

[1]  V A Simossis,et al.  Integrating protein secondary structure prediction and multiple sequence alignment. , 2004, Current protein & peptide science.

[2]  J. Garnier,et al.  Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. , 1988, Biochimica et biophysica acta.

[3]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[4]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[5]  Andrzej Kloczkowski,et al.  Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information , 2002 .

[6]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[7]  Barry Robson,et al.  An algorithm for secondary structure determination in proteins based on sequence similarity , 1986, FEBS letters.

[8]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[9]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Burkhard Rost,et al.  Prediction in 1D: secondary structure, membrane helices, and accessibility. , 2003, Methods of biochemical analysis.

[11]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[12]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[13]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[14]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[15]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[16]  S Salzberg,et al.  Predicting protein secondary structure with a nearest-neighbor algorithm. , 1992, Journal of molecular biology.

[17]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[18]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[19]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[20]  Kevin Karplus,et al.  SAM-T08, HMM-based protein structure prediction , 2009, Nucleic Acids Res..

[21]  Taner Z Sen,et al.  combining GOR V and Fragment Database Mining A Consensus Data Mining secondary structure prediction by , 2006 .

[22]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[23]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[24]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[25]  Shuai Cheng Li,et al.  Fragment‐HMM: A new approach to protein structure prediction , 2008, Protein science : a publication of the Protein Society.

[26]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[27]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[28]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[29]  Jiang Xie,et al.  PRT-HMM: A Novel Hidden Markov Model for Protein Secondary Structure Prediction , 2012, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.

[30]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[31]  Taner Z Sen,et al.  Prediction of protein secondary structure by mining structural fragment database. , 2005, Polymer.

[32]  Haitao Cheng,et al.  Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM) , 2007, Bioinform..

[33]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[34]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[35]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[36]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[37]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[38]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[39]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[40]  C Sander,et al.  Third generation prediction of secondary structures. , 2000, Methods in molecular biology.

[41]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[42]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[43]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[44]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.