SVM based approaches for classifying protein tertiary structures

The tertiary structure of a protein molecule is the main factor which can be used to determine its chemical properties as well as its function. The knowledge of the protein function is crucial in the development of new drugs, better crops and synthetic biochemicals. With the rapid development in technology, the number of determined protein structures increases every day, so retrieving structurally similar proteins using current algorithms takes too long. Therefore, improving the efficiency of the methods for protein structure retrieval and classification is an important research issue in bioinformatics community. In this paper, we present two SVM based protein classifiers. Our classifiers use the information about the conformation of protein structures in 3D space. Namely, our protein voxel and ray based protein descriptors are used for representing the protein structures. A part of the SCOP 1.73 database is used for evaluation of our classifiers. The results show that our approach achieves 98.7% classification accuracy by using the protein ray based descriptor, while it is much faster than other similar algorithms with comparable accuracy. We provide some experimental results.

[1]  Dietmar Saupe,et al.  3D Model Retrieval , 2001 .

[2]  Srinivasan Parthasarathy,et al.  A multi-level approach to SCOP fold recognition , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[3]  G. Mirceva,et al.  Comparative Analysis of Three Efficient Approaches for Retrieving Protein 3D Structures , 2008, 2008 Cairo International Biomedical Engineering Conference.

[4]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[5]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Jinn-Moon Yang,et al.  fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies , 2007, Nucleic Acids Res..

[8]  Yuan Qi,et al.  SCOPmap: Automated assignment of protein structures to evolutionary superfamilies , 2004, BMC Bioinformatics.

[9]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[10]  Chi-Ren Shyu,et al.  Efficient protein tertiary structure retrievals and classifications using content based comparison algorithms , 2007 .

[11]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[12]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[13]  Michael G. Strintzis,et al.  Three-Dimensional Shape-Structure Comparison Method for Protein Classification , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[15]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[19]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[22]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[23]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..