Gram-positive bacterial protein subcellular localization prediction using features fusion strategy

Prediction of protein subcellular localization is the most challenging field for the researchers because of its importance in different branch of molecular biology and drug discovery. Last two decades, a large number of machine learning approaches have been tested into sequence based features for the prediction of subcellular localization. Single features like amino acid composition (AAC), pseudo amino acid composition (PseAAC) and physiochemical property model (PPM)) contain insufficient information due to their single perspectives. To overcome this problem, the main contribution of our work is to propose two feature fusion representations AACPPM and PAACPPM which can be fused PPM with AAC and PseAAC respectively. Support Vector Machine (SVM) is applied as a classifier on to both single and fused feature representations of Gram-positive bacterial dataset. The actual accuracy of AACPPM is 72.4% which is 2% higher than single feature representations and 6% higher than X. Qu et al [1]. The locative accuracy of both AACPPM and PAACPPM is 73.2% which is also 2% higher than single feature representations.

[1]  Yuehui Chen,et al.  Predicting the Subcellular Localization of Proteins with Multiple Sites Based on Multiple Features Fusion , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Shunfang Wang,et al.  Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA , 2015, International journal of molecular sciences.

[3]  Guozheng Li,et al.  Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble , 2015, BMC Bioinformatics.

[4]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[5]  Xiaoqi Zheng,et al.  Prediction of bacterial protein subcellular localization by incorporating various features into Chou's PseAAC and a backward feature selection approach. , 2014, Biochimie.

[6]  Loris Nanni,et al.  An Empirical Study of Different Approaches for Protein Classification , 2014, TheScientificWorldJournal.

[7]  Jenn-Kang Hwang,et al.  CELLO2GO: A Web Server for Protein subCELlular LOcalization Prediction with Functional Gene Ontology Annotation , 2014, PloS one.

[8]  Chao Huang,et al.  Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites , 2013, Biosyst..

[9]  Sun-Yuan Kung,et al.  mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines , 2012, BMC Bioinformatics.

[10]  Changqing Li,et al.  An Ensemble Classifier for Eukaryotic Protein Subcellular Location Prediction Using Gene Ontology Categories and Amino Acid Hydrophobicity , 2012, PloS one.

[11]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[12]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[13]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[14]  Jian Yang,et al.  Feature fusion: parallel strategy vs. serial strategy , 2003, Pattern Recognit..

[15]  Kuo-Chen Chou,et al.  Artificial Neural Network Model for Predicting Protein Subcellular Location , 2002, Comput. Chem..

[16]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[17]  A one-letter notation for amino acid sequences. , 1972, Pure and applied chemistry. Chimie pure et appliquee.