Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy

Determining the structural class of a given protein can provide important information about its functionality and its general tertiary structure. In the last two decades, the protein structural class prediction problem has attracted tremendous attention and its prediction accuracy has been significantly improved. Features extracted from the Position Specific Scoring Matrix (PSSM) have played an important role to achieve this enhancement. However, this information has not been adequately explored since the protein structural class prediction accuracy relying on PSSM for feature extraction still remains limited. In this study, to explore this potential, we propose segmentation-based feature extraction technique based on the concepts of amino acids' distribution and auto covariance. By applying a Support Vector Machine (SVM) to our extracted features, we enhance protein structural class prediction accuracy up to 16% over similar studies found in the literature. We achieve over 90% and 80% prediction accuracies for 25PDB and 1189 benchmarks respectively by solely relying on the PSSM for feature extraction.

[1]  Xiaoqi Zheng,et al.  Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles , 2011, Amino Acids.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Lukasz A. Kurgan,et al.  Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy , 2006, Pattern Recognit..

[5]  Cangzhi Jia,et al.  A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. , 2010, Journal of theoretical biology.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Angelo M Facchiano,et al.  Prediction of the protein structural class by specific peptide frequencies. , 2009, Biochimie.

[8]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[9]  Pooja Jain,et al.  Automatic structure classification of small proteins using random forest , 2010, BMC Bioinformatics.

[10]  Parviz Abdolmaleki,et al.  Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. , 2007, Journal of theoretical biology.

[11]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[12]  K. Chou,et al.  Using LogitBoost classifier to predict protein structural classes. , 2006, Journal of theoretical biology.

[13]  Zong Dai,et al.  Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis , 2008, Amino Acids.

[14]  Lukasz A. Kurgan,et al.  Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences , 2009, BMC Bioinformatics.

[15]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[16]  Z.-C. Li,et al.  Prediction of protein structure class by coupling improved genetic algorithm and support vector machine , 2008, Amino Acids.

[17]  Zu-Guo Yu,et al.  Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. , 2009 .

[18]  Lukasz A. Kurgan,et al.  SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences , 2008, BMC Bioinformatics.

[19]  Feng Ye,et al.  Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM , 2012, Journal of biomolecular structure & dynamics.

[20]  Parviz Abdolmaleki,et al.  Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. , 2007, Biophysical chemistry.

[21]  Lukasz Kurgan,et al.  Prediction of protein structural class for the twilight zone sequences. , 2007, Biochemical and biophysical research communications.

[22]  Xin Chen,et al.  Prediction of protein structural classes for low-homology sequences based on predicted secondary structure , 2010, BMC Bioinformatics.

[23]  Shengli Zhang,et al.  High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. , 2011, Biochimie.

[24]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[25]  Zheng Yuan,et al.  How good is prediction of protein structural class by the component‐coupled method? , 2000, Proteins.

[26]  Lukasz A. Kurgan,et al.  Prediction of protein structural class using novel evolutionary collocation‐based sequence representation , 2008, J. Comput. Chem..

[27]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[28]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.