Protein secondary structure prediction using DWKF based on SVR-NSGAII

Prediction of protein secondary structure is an important step towards elucidating its three dimensional structure and its function. This is a challenging problem in bioinformatics. By introduction of machine learning for protein structure prediction, a solution has brought to this challenge to some extent. In the literature of Machine learning or data mining, regression and classification problems are typically viewed as two distinct problems differentiated by continuous or categorical dependent variable. There are endeavors to use regression methods to solve the classification problem and vice versa. To regard a classification problem as a regression one, we proposed a method which is based on Support Vector Regression (SVR) classification model as one of the powerful methods in the field of machine intelligence. We applied non-dominated Sorting Genetic Algorithm II (NSGAII) to find mapping points (MPs) for rounding a real-value to an integer one. Also NSGAII is used for finding out and tuning SVR kernel parameters optimally to enhance the performance of our model and achieve better results. At the other hand, using a suitable SVR kernel function for a particular problem can improve the prediction results remarkably but there is not a kernel which can predict all protein secondary structure classes with acceptable accuracy. Therefore we use a Dynamic Weighted Kernel Fusion (DWKF) method for fusing of three SVR kernels to achieve a supreme performance. Also to improve our method, Position Scoring Matrix (PSSM) profiles are used as the input information to it. The goals of this research are to regulate SVR parameters and fuse different SVR kernel outputs in order to determine protein secondary structure classes accurately. The obtained classification accuracies of our method are 85.79% and 84.94% on RS126 and CB513 datasets respectively and they are promising with regard to other classification methods in the literature. Moreover, for gauging our method behavior in comparison to other state of arts methods, an independent dataset is used and achieves 81.4% accuracy. Our method cannot achieve the best value for any considered performance metrics on an independent dataset but its values for whole metrics are quite acceptable.

[1]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[2]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[3]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[4]  C Sander,et al.  Third generation prediction of secondary structures. , 2000, Methods in molecular biology.

[5]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[6]  Aleksey A. Porollo,et al.  Combining prediction of secondary structure and solvent accessibility in proteins , 2005, Proteins.

[7]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[8]  J M Chandonia,et al.  Neural networks for secondary structure and structural class predictions , 1995, Protein science : a publication of the Protein Society.

[9]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[10]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[11]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[12]  D. Ruta,et al.  An Overview of Classifier Fusion Methods , 2000 .

[13]  Christian Cole,et al.  The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..

[14]  S. Wodak,et al.  Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. , 1991, Journal of molecular biology.

[15]  Lijun Wang,et al.  Improved Protein Secondary Structure Prediction Using a Intelligent HSVM Method with a New Encoding Scheme , 2011 .

[16]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[17]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[18]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[19]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[20]  John P. Overington,et al.  The prediction and orientation of alpha-helices from sequence alignments: the combined use of environment-dependent substitution tables, Fourier transform methods and helix capping rules. , 1994, Protein engineering.

[21]  Yi Zhao,et al.  A protein secondary structure prediction framework based on the Extreme Learning Machine , 2008, Neurocomputing.

[22]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[23]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[24]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[25]  Juliette Martin,et al.  Analysis of an optimal hidden Markov model for secondary structure prediction , 2006, BMC Structural Biology.

[26]  Jagath C. Rajapakse,et al.  Two-Stage Multi-Class Support Vector Machines to Protein Secondary Structure Prediction , 2004, Pacific Symposium on Biocomputing.

[27]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[28]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[29]  Jonathan D. Hirst,et al.  Prediction of backbone dihedral angles and protein secondary structure using support vector machines , 2009, BMC Bioinformatics.

[30]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[31]  Saeed Jalili,et al.  PSSP with dynamic weighted kernel fusion based on SVM-PHGS , 2012, Knowl. Based Syst..

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  Yi Pan,et al.  Rule generation for protein secondary structure prediction with support vector machines and decision tree , 2006, IEEE Transactions on NanoBioscience.

[34]  Jagath C Rajapakse,et al.  Multi-class support vector machines for protein secondary structure prediction. , 2003, Genome informatics. International Conference on Genome Informatics.

[35]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[36]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[37]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[38]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[39]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[40]  George Karypis,et al.  YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction , 2006, Proteins.

[41]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[42]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[43]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[44]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[45]  Ivica Kostanic,et al.  Principles of Neurocomputing for Science and Engineering , 2000 .

[46]  M Kanehisa A multivariate analysis method for discriminating protein secondary structural segments. , 1988, Protein engineering.

[47]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[48]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[49]  Bingru Yang,et al.  Predicting protein secondary structure using a mixed-modal SVM method in a compound pyramid model , 2011, Knowl. Based Syst..

[50]  Anders Krogh,et al.  Improving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments , 1996, J. Comput. Biol..

[51]  Jeff A. Bilmes,et al.  Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure , 2011, BMC Bioinformatics.

[52]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[53]  I. Kuntz,et al.  Tertiary Structure Prediction , 1989 .

[54]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[55]  T L Blundell,et al.  The use of amino acid patterns of classified helices and strands in secondary structure prediction. , 1996, Journal of molecular biology.

[56]  Marimuthu Palaniswami,et al.  Protein Secondary Structure Prediction Using Support Vector Machines and a New Feature Representation , 2006, Int. J. Comput. Intell. Appl..

[57]  Adam Prügel-Bennett,et al.  An evolutionary method for learning HMM structure: prediction of protein secondary structure , 2007, BMC Bioinformatics.

[58]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[59]  María S. Pérez-Hernández,et al.  Bayesian network multi-classifiers for protein secondary structure prediction , 2004, Artif. Intell. Medicine.

[60]  A. Finkelstein,et al.  Theory of protein secondary structure and algorithm of its prediction , 1983, Biopolymers.

[61]  Gary B. Lamont,et al.  Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art , 2000, Evolutionary Computation.

[62]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[63]  Bingru Yang,et al.  The Research of Protein Secondary Structure Prediction System Based on KDTICM , 2009 .

[64]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[65]  Lukasz A. Kurgan,et al.  Critical assessment of high-throughput standalone methods for secondary structure prediction , 2011, Briefings Bioinform..

[66]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.