Classification of protein structures by using fuzzy KNN classifier and protein voxel-based descriptor

Protein classification is among the main themes in bioinformatics, for the reason that it helps understand the protein molecules. By classifying the protein structures, the evolutionary relations between them can be discovered. The knowledge for protein structures and the functions that they might have could be used to regulate the processes in organisms, which is made by developing medications for different diseases. In the literature, plethora of methods for protein classification are offered, including manual, automatic or semiautomatic methods. The manual methods are considered as precise, but their main problem is that they are time consuming, hence by using them a large number of protein structures stay uncategorized. Therefore, the researchers intensively work on developing methods that would afford classification of protein structures in automatic way with acceptable precision. In this paper, we propose an approach for classifying protein structures. Our protein voxel-based descriptor is used to describe the features of protein structures. For classification of unclassified protein structures, we use a k nearest neighbors classifier based on fuzzy logic. For evaluation, we use knowledge for the classification of protein structures in the SCOP database. We provide some results from the evaluation of our approach. The results show that the proposed approach provide accurate classification of protein structures with reasonable speed.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  Yuan Qi,et al.  SCOPmap: Automated assignment of protein structures to evolutionary superfamilies , 2004, BMC Bioinformatics.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Jinn-Moon Yang,et al.  fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies , 2007, Nucleic Acids Res..

[5]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[6]  Chi-Ren Shyu,et al.  Efficient protein tertiary structure retrievals and classifications using content based comparison algorithms , 2007 .

[7]  Srinivasan Parthasarathy,et al.  A multi-level approach to SCOP fold recognition , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[8]  Céline Loscos,et al.  3D Model Retrieval , 2013 .

[9]  Georgina Mirceva,et al.  Efficient Approaches for Retrieving Protein Tertiary Structures , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[11]  Dejan V. Vranic,et al.  3D model retrieval , 2004 .

[12]  Michael G. Strintzis,et al.  Three-Dimensional Shape-Structure Comparison Method for Protein Classification , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[15]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[16]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[17]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.