A new hybrid coding for protein secondary structure prediction based on primary structure similarity.

The coding pattern of protein can greatly affect the prediction accuracy of protein secondary structure. In this paper, a novel hybrid coding method based on the physicochemical properties of amino acids and tendency factors is proposed for the prediction of protein secondary structure. The principal component analysis (PCA) is first applied to the physicochemical properties of amino acids to construct a 3-bit-code, and then the 3 tendency factors of amino acids are calculated to generate another 3-bit-code. Two 3-bit-codes are fused to form a novel hybrid 6-bit-code. Furthermore, we make a geometry-based similarity comparison of the protein primary structure between the reference set and the test set before the secondary structure prediction. We finally use the support vector machine (SVM) to predict those amino acids which are not detected by the primary structure similarity comparison. Experimental results show that our method achieves a satisfactory improvement in accuracy in the prediction of protein secondary structure.

[1]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[2]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[3]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[4]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[5]  Matthew I. Bellgard,et al.  Data representation influences protein secondary structure prediction using artificial neural networks , 2001, The Seventh Australian and New Zealand Intelligent Information Systems Conference, 2001.

[6]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[7]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[8]  Milan Randic,et al.  On 3-D Graphical Representation of Proteomics Maps and Their Numerical Characterization , 2001, J. Chem. Inf. Comput. Sci..

[9]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[12]  Gesine Reinert,et al.  A statistical approach using network structure in the prediction of protein characteristics , 2007, Bioinform..

[13]  Jun Wang,et al.  An information‐theoretic approach to the prediction of protein structural class , 2010, J. Comput. Chem..

[14]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[15]  Yang Li,et al.  A novel protein structural classes prediction method based on predicted secondary structure. , 2012, Biochimie.

[16]  Xiaowei Xu,et al.  Constructing a robust protein-protein interaction network by integrating multiple public databases , 2011, BMC Bioinformatics.

[17]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[18]  Pingan He,et al.  A Novel Method of 3D Graphical Representation and Similarity Analysis for Proteins , 2014 .

[19]  Juan Liu,et al.  Predicting protein secondary structure by a support vector machine based on a new coding scheme. , 2004, Genome informatics. International Conference on Genome Informatics.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Jeff A. Bilmes,et al.  Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure , 2011, BMC Bioinformatics.

[22]  Xiaoqi Zheng,et al.  Prediction of protein structural class using a complexity-based distance measure , 2010, Amino Acids.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[25]  S. Brunak,et al.  Protein secondary structure and homology by neural networks The α‐helices in rhodopsin , 1988 .

[26]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..