SVM-based method for subcellular localization of human proteins using amino acid compositions , their order and similarity search

Running title: SVM-based method for subcellular localization of human proteins SVM-based method for subcellular localization of human proteins 2 Summary Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear and plasma membrane) of human proteins. Firstly, SVM based modules for predicting subcellular localization using traditional amino acid and dipeptide (i+1) composition achieved overall accuracy of 76.6% and 77.8%, respectively. PSI-BLAST when carried out using similarity-based search against non-redundant database of experimentally annotated proteins yielded 73.3% accuracy. To gain further insight, hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM module based on different higher order dipeptide i.e. i+2, i+3, and i+4 were also constructed for the prediction of subcellular localization of human proteins and overall accuracy of 79.7%, 77.5% and 77.1% was accomplished respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i+1) and higher order dipeptide (i+2, i+3, and i+4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions and PSI-BLAST output and achieved an overall accuracy of 84.4%. A web server HSLPred has been designed to predict subcellular localization of human proteins using the above approaches.

[1]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[2]  BMC Bioinformatics , 2005 .

[3]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[4]  R. Quatrano Genomics , 1998, Plant Cell.

[5]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .