Recognition of Structure Classification of Protein Folding by NN and SVM Hierarchical Learning Architecture

Classifying the structure of protein is a very important task in biological data. By means of the classification, the relationships and characteristics among known proteins can be exploited to predict the structure of new proteins. The study of the protein structures is based on the sequences and their similarity. It is a difficult task. Recently, due to the ability of machine learning techniques, many researchers have applied them to probe into this protein classification problem. We also apply here machine learning methods for multi-class protein fold recognition problem by proposing a novel hierarchical learning architecture. This novel hierarchical learning architecture can be formed by NN (neural networks) or SVM (support vector machine) as basic building blocks. Our results show that both of them can perform well. We use this new architecture to attack the multi-class protein fold recognition problem as proposed by Dubchak and Ding in 2001. With the same set of features our method can not only obtain better prediction accuracy and lower computation time, but also can avoid the use of the stochastic voting process in the original approach.

[1]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[2]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[4]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[9]  Cathy H. Wu,et al.  Neural networks and genome informatics , 2000 .

[10]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[11]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[12]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[13]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[15]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.