Improving protein secondary structure prediction using a multi-modal BP method

Methods for predicting protein secondary structures provide information that is useful both in ab initio structure prediction and as additional restraints for fold recognition algorithms. Secondary structure predictions may also be used to guide the design of site directed mutagenesis studies, and to locate potential functionally important residues. In this article, we propose a multi-modal back propagation neural network (MMBP) method for predicting protein secondary structures. Using a Knowledge Discovery Theory based on Inner Cognitive Mechanism (KDTICM) method, we have constructed a compound pyramid model (CPM), which is composed of three layers of intelligent interface that integrate multi-modal back propagation neural network (MMBP), mixed-modal SVM (MMS), modified Knowledge Discovery in Databases (KDD(⁎)) process and so on. The CPM method is both an integrated web server and a standalone application that exploits recent advancements in knowledge discovery and machine learning to perform very accurate protein secondary structure predictions. Using a non-redundant test dataset of 256 proteins from RCASP256, the CPM method achieves an average Q(3) score of 86.13% (SOV99=84.66%). Extensive testing indicates that this is significantly better than any other method currently available. Assessments using RS126 and CB513 datasets indicate that the CPM method can achieve average Q(3) score approaching 83.99% (SOV99=80.25%) and 85.58% (SOV99=81.15%). By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called CPM, which performs these secondary structure predictions, is accessible at http://kdd.ustb.edu.cn/protein_Web/.

[1]  Elias Zintzaras,et al.  Non-parametric classification of protein secondary structures , 2006, Comput. Biol. Medicine.

[2]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[3]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[4]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[5]  Yang Bingru,et al.  KAAPRO: An approach of protein secondary structure prediction based on KDD* in the compound pyramid prediction model , 2009 .

[6]  Jinyan Li,et al.  Guest Editors' Introduction: Data Mining in Bioinformatics , 2005, IEEE Intell. Syst..

[7]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[8]  Cheng Hao Jin,et al.  Correlation of Amino Acid Physicochemical Properties with Protein Secondary Structure Conformation , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[9]  Bingru Yang,et al.  Predicting protein second structure using a novel hybrid method , 2011, Expert Syst. Appl..

[10]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[11]  María S. Pérez-Hernández,et al.  Bayesian network multi-classifiers for protein secondary structure prediction , 2004, Artif. Intell. Medicine.

[12]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[13]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[14]  Yo-Ping Huang,et al.  A Fuzzy Semantic Approach to Retrieving Bird Information Using Handheld Devices , 2005, IEEE Intell. Syst..

[15]  V A Simossis,et al.  Integrating protein secondary structure prediction and multiple sequence alignment. , 2004, Current protein & peptide science.

[16]  Byoung-Tak Zhang,et al.  Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures , 2009, Comput. Biol. Medicine.

[17]  G. Barton Protein multiple sequence alignment and flexible pattern matching. , 1990, Methods in enzymology.

[18]  Yang Bing,et al.  MINING QUANTITATIVE ASSOCIATION RULES WITH STANDARD SQL QUERIES AND IT'S EVALUATION , 2002 .

[19]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[22]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[23]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[24]  Jaap Heringa,et al.  The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods , 2004, Comput. Biol. Chem..

[25]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[26]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[27]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[28]  Christian Cole,et al.  The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..

[29]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[30]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[31]  Halima Bensmail,et al.  Bioinformatics and data mining in proteomics , 2006, Expert review of proteomics.

[32]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[33]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[34]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[35]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[37]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[38]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[39]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[40]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[41]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[42]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[43]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[44]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[45]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[46]  Hannu Toivonen,et al.  Data Mining In Bioinformatics , 2005 .

[47]  David S. Wishart,et al.  PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation , 2008, Nucleic Acids Res..

[48]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[49]  Lijun Wang,et al.  Improved Protein Secondary Structure Prediction Using a Intelligent HSVM Method with a New Encoding Scheme , 2011 .

[50]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[51]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[52]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[53]  Bingru Yang,et al.  Predicting protein secondary structure using a mixed-modal SVM method in a compound pyramid model , 2011, Knowl. Based Syst..

[54]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[55]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[56]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[57]  David Pantoja-Uceda,et al.  De novo Design of Monomeric β-Hairpin and β-Sheet Peptides , 2006 .

[58]  Zhou Zhun,et al.  An improved CBA prediction algorithm in compound pyramid model , 2009, 2009 Chinese Control and Decision Conference.

[59]  Wen-Lian Hsu,et al.  HYPROSP II-A knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence , 2005, Bioinform..

[60]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[61]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[62]  Wen-Lian Hsu,et al.  HYPROSP: a hybrid protein secondary structure prediction algorithm--a knowledge-based approach. , 2004, Nucleic acids research.