Using Semi-supervised Discriminant Analysis to Predict Subcellular Localization of Gram-negative Bacterial Proteins

In this paper, an effective dimension reduction approach called semi-supervised discriminant analysis (SDA) is employed to deal with the protein subcellular localization problem. Firstly, a novel protein sequence encoding method that consists of pseudo amino acid composition (PseAAC) and dipeptide composition (DC) is introduced to represent a protein. Secondly, the SDA algorithm is applied to extract the essential discriminant features from the combined feature data set consisting of PseAAC and DC. Finally, the K-nearest neighbor (K-NN) classifier is used to identify the subcellular localization of Gram-positive bacterial proteins. The proposed method can effective utilize both manifold information and the class information of the protein samples to guide the produce of protein subcellular localization. To evaluate the prediction performance of the proposed algorithm, a jackknife test based on nearest neighbor algorithm is employed on the gram-negative bacterial proteins data set. The results show that we can get a high total accuracy in a low-dimensional feature space, which indicates that the proposed approach is effective and practical.

[1]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[3]  Hiroshi Nakashima,et al.  Differences in Amino Acid Composition between α and β Structural Classes of Proteins , 2014 .

[4]  K. Chou,et al.  Does the folding type of a protein depend on its amino acid composition? , 1995, FEBS letters.

[5]  Pasquale Petrilli Classification of protein sequences by their dipeptide composition , 1993, Comput. Appl. Biosci..

[6]  Tong Wang,et al.  Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins , 2009, Molecular Diversity.

[7]  Hong Gu,et al.  A novel method for predicting protein subcellular localization based on pseudo amino acid composition. , 2010, BMB reports.

[8]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[9]  Shan Li,et al.  Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou's pseudo amino acid composition , 2017 .

[10]  K Nishikawa,et al.  Distinct character in hydrophobicity of amino acid compositions of mitochondrial proteins , 1990, Proteins.

[11]  Shunfang Wang,et al.  Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA , 2015, International journal of molecular sciences.

[12]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[13]  S. Khan,et al.  Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. , 2017, Journal of theoretical biology.

[14]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[15]  Nana Li,et al.  Interconnection between the protein solubility and amino acid and dipeptide compositions. , 2012, Protein and peptide letters.

[16]  Abdollah Dehzangi,et al.  Subcellular localization for Gram positive and Gram negative bacterial proteins using linear interpolation smoothing model. , 2015, Journal of theoretical biology.