Protein Fold Classification using Kohonen's Self-Organizing Map

Protein fold classification is an important problem in bioinformatics and a challenging task for machine-learning algorithms. In this paper we present a solution which classifies protein folds using Kohonen’s Self-Organizing Map (SOM) and a comparison between few approaches for protein fold classification. We use SOM, Fisher Linear Discriminant Analysis (FLD), K-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) methods to classify three SCOP folds with six features (amino acid composition, predicted secondary structure, hydrophobicity, normalized van der Waals volume, polarity and polarizability). This paper has a novelty in the way of applying SOM to these six features, and also portrays the capabilities of SOM among the other methods in protein fold classification. The methods are tested on 120 proteins by applying 10-fold cross-validation technique and 93.33% classification performance is obtained with SOM.

[1]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[2]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[3]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[4]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[5]  X.-D. Sun,et al.  Prediction of protein structural classes using support vector machines , 2006, Amino Acids.

[6]  Alfredo Petrosino,et al.  Protein Structural Blocks Representation and Search through Unsupervised NN , 2012, ICANN.

[7]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[8]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Leonardo Noriega,et al.  Multilayer Perceptron Tutorial , 2005 .

[11]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[12]  Mohd Saberi Mohamad,et al.  Protein Secondary Structure Prediction Using Optimal Local Protein Structure and Support Vector Machine , 2012, BSBT 2012.

[13]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[14]  Luo Liang Predicting the Secondary Structure of Proteins Using New Ways of Classification , 2012, 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics.

[15]  Azadeh Shakery,et al.  Protein Fold Pattern Recognition Using Bayesian Ensemble of RBF Neural Networks , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[16]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.