Two multi-classification strategies used on SVM to predict protein structural classes by using auto covariance

Machine learning methods play the very important role in protein secondary structure prediction and other related works. On condition of a certain approach, the prediction qualities mostly depend on the ways of representing protein sequences into numeric features. In this paper, two Support Vector Machine (SVM) multi-classification strategies, “one-against-one” (1-a-1) and “one-against-all” (1-a-a), were used in protein structural classes identification. Auto covariance (AC), which transforms the physicochemical properties of the amino acids of the proteins into a data matrix, focuses on the neighboring effects and the interactions between residues in protein sequences. “1-a-1” approach was used on SVM to predict protein structural classes and obtained very promising overall accuracy 90.69% by Jackknife test. It was more than 10% higher than the accuracy obtained by using “1-a-a”. Experimental results led to the finding that the SVM predictor constructed by “1-a-1” can avoid the appearance of biased prediction accuracy. This current method, using the protein primary sequence information described by auto covariance (AC) and “1-a-1” approach on SVM, should play an important complementary role in other related applications.

[1]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[2]  Miguel Figueroa,et al.  Competitive learning with floating-gate circuits , 2002, IEEE Trans. Neural Networks.

[3]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: III. , 1979, Biochimica et biophysica acta.

[4]  Gajendra P. S. Raghava,et al.  A neural network method for prediction of ?-turn types in proteins using evolutionary information , 2004, Bioinform..

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[7]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[8]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[9]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[10]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[11]  Tongliang Zhang,et al.  Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes , 2007, Amino Acids.

[12]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[16]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[17]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[18]  Menglong Li,et al.  Predicting G‐protein coupled receptors–G‐protein coupling specificity based on autocross‐covariance transform , 2006, Proteins.

[19]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[20]  Kuo-Chen Chou,et al.  Prediction and classification of protein subcellular location—sequence‐order effect and pseudo amino acid composition , 2003, Journal of cellular biochemistry.

[21]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[22]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[23]  Irini A. Doytchinova,et al.  BMC Bioinformatics BioMed Central Methodology article VaxiJen: a server for prediction of protective antigens, tumour , 2007 .

[24]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[25]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[26]  Ali Karci,et al.  Estimation of protein structures by classification of angles between alpha-carbons of amino acids based on artificial neural networks , 2009, Expert Syst. Appl..

[27]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[28]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[29]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[30]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[31]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[32]  Yi Pan,et al.  Clustering support vector machines for protein local structure prediction , 2007, Expert Syst. Appl..

[33]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: II. , 1979, Biochimica et biophysica acta.

[34]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[35]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[36]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[37]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[38]  Wu,et al.  Genetic algorithm-base virtual screening of combinative mode for peptide/protein , 2006 .