Prediction of Protein Subcellular Multi-locations with a Min-Max Modular Support Vector Machine

How to predict subcellular multi-locations of proteins with machine learning techniques is a challenging problem in computational biology community. Regarding the protein multi-location problem as a multi-label pattern classification problem, we propose a new predicting method for dealing with the protein subcellular localization problem in this paper. Two key points of the proposed method are to divide a seriously unbalanced multi-location problem into a number of more balanced two-class subproblems by using the part-versus-part task decomposition approach, and learn all of the subproblems by using the min-max modular support vector machine (M3-SVM). To evaluate the effectiveness of the proposed method, we perform experiments on yeast protein data set by using two kinds of task decomposition strategies and three kinds of feature extraction methods. The experimental results demonstrate that our method achieves the highest prediction accuracy, which is much better than that obtained by the existing approach based on the traditional support vector machine.

[1]  Fujiwara,et al.  Prediction of Mitochondrial Targeting Signals Using Hidden Markov Model. , 1997, Genome informatics. Workshop on Genome Informatics.

[2]  Hai Zhao,et al.  Fast text categorization with min-max modular support vector machines , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[3]  T. Joachims Support Vector Machines , 2002 .

[4]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[5]  Masami Ito,et al.  Task decomposition and module combination based on class relations: a modular neural network for pattern classification , 1999, IEEE Trans. Neural Networks.

[6]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[7]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[8]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[9]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[10]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[11]  Bao-Liang Lu,et al.  A part-versus-part method for massively parallel training of support vector machines , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[12]  Yang Yang,et al.  Extracting Features from Protein Sequences Using Chinese Segmentation Techniques for Subcellular Localization , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[13]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.