A Protein Classifier Based on SVM by Using the Voxel Based Descriptor

The tertiary structure of a protein molecule is the main factor which determines its function. All information required for a protein to be folded in its natural structure, is coded in its amino acid sequence. The way this sequence folds in the 3D space can be used for determining its function. With the technology innovations, the number of determined protein structures increases every day, so improving the efficiency of protein structure retrieval and classification methods becomes an important research issue. In this paper, we propose a novel protein classifier. Our classifier considers the conformation of protein structure in the 3D space. Namely, our voxel based protein descriptor is used for representing the protein structures. Then, the Support Vector Machine method (SVM) is used for classifying protein structures. The results show that our classifier achieves 78.83% accuracy, while it is faster than other algorithms with comparable accuracy.

[1]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[3]  G. Mirceva,et al.  Comparative Analysis of Three Efficient Approaches for Retrieving Protein 3D Structures , 2008, 2008 Cairo International Biomedical Engineering Conference.

[4]  Yuan Qi,et al.  SCOPmap: Automated assignment of protein structures to evolutionary superfamilies , 2004, BMC Bioinformatics.

[5]  Michael G. Strintzis,et al.  Three-Dimensional Shape-Structure Comparison Method for Protein Classification , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[7]  Ambuj K. Singh,et al.  Decision Tree Based Information Integration for Automated Protein Classification , 2005, J. Bioinform. Comput. Biol..

[8]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[9]  Srinivasan Parthasarathy,et al.  A multi-level approach to SCOP fold recognition , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[10]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[13]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  Chi-Ren Shyu,et al.  Efficient protein tertiary structure retrievals and classifications using content based comparison algorithms , 2007 .

[16]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[17]  Jinn-Moon Yang,et al.  fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies , 2007, Nucleic Acids Res..

[18]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.