Weighted feature dimensions according to Fisher's linear discriminant rate and its application on protein sub-cellular localization

The efficiency research about protein sub-cellular localization has become a hot topic recently. Feature extraction plays an important role in the accurate classification or location of proteins. Since the contribution of each feature dimension is different, this paper enlarges the contribution of feature dimensions which have great effect on classification by weighting with its Fisher linear discriminant rate. Then k-nearest neighbor (KNN) algorithm is used to classify the testing sample. The result shows that, compared to direct use of KNN algorithm, KNN with LDA dimensional reduction improves the predicting accuracy rate, and the proposed KNN based on Fisher's linear discriminant rate weighting method with LDA dimensional reduction can further reduce the redundance impact and enhance the accuracy of protein localization.

[1]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[2]  Sun-Yuan Kung,et al.  mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor. , 2015, Journal of theoretical biology.

[3]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[5]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[6]  Sun-Yuan Kung,et al.  HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins , 2014, PloS one.

[7]  Yuan Zhang,et al.  Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition. , 2012, Protein and peptide letters.

[8]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[9]  K. Nakai,et al.  PROTEIN SUBCELLULAR LOCALIZATION PREDICTION , 2008 .

[10]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[11]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[12]  Sun-Yuan Kung,et al.  mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. , 2015, Analytical biochemistry.

[13]  WU Jing-li Weighted K-means clustering algorithm based on Fisher's linear discriminant ratio , 2010 .

[14]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.