Protein Subcellular Localization with Gaussian Kernel Discriminant Analysis and Its Kernel Parameter Selection

Kernel discriminant analysis (KDA) is a dimension reduction and classification algorithm based on nonlinear kernel trick, which can be novelly used to treat high-dimensional and complex biological data before undergoing classification processes such as protein subcellular localization. Kernel parameters make a great impact on the performance of the KDA model. Specifically, for KDA with the popular Gaussian kernel, to select the scale parameter is still a challenging problem. Thus, this paper introduces the KDA method and proposes a new method for Gaussian kernel parameter selection depending on the fact that the differences between reconstruction errors of edge normal samples and those of interior normal samples should be maximized for certain suitable kernel parameters. Experiments with various standard data sets of protein subcellular localization show that the overall accuracy of protein classification prediction with KDA is much higher than that without KDA. Meanwhile, the kernel parameter of KDA has a great impact on the efficiency, and the proposed method can produce an optimum parameter, which makes the new algorithm not only perform as effectively as the traditional ones, but also reduce the computational time and thus improve efficiency.

[1]  Babak Nadjar Araabi,et al.  A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM , 2011, Comput. Biol. Chem..

[2]  Rod B. Watson,et al.  Localization of Organelle Proteins by Isotope Tagging (LOPIT)*S , 2004, Molecular & Cellular Proteomics.

[3]  Tong Wang,et al.  Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins , 2009, Molecular Diversity.

[4]  Ramón Fernández Astudillo,et al.  Uncertain LDA: Including Observation Uncertainties in Discriminative Transforms , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xia Xue Bioinformatics Research in Subcellular Localization of Protein , 2007 .

[6]  K. Chou,et al.  Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. , 2010, Journal of theoretical biology.

[7]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[8]  Wenli Xu,et al.  Model selection of Gaussian kernel PCA for novelty detection , 2014 .

[9]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[10]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[11]  Wang Wei,et al.  Gird-pattern method for model selection of support vector machines , 2008 .

[12]  Zhang Shu-bo Machine Learning-based Prediction of Subcellular Localization for Protein , 2009 .

[13]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[14]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[15]  K. Chou,et al.  Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. , 2007, Protein engineering, design & selection : PEDS.

[16]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[17]  Andrea Passerini,et al.  Joint probabilistic-logical refinement of multiple protein feature predictors , 2014, BMC Bioinformatics.

[18]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[19]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[20]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[21]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[22]  Shunfang Wang,et al.  Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA , 2015, International journal of molecular sciences.

[23]  Li Hang,et al.  Person re-identification based on feature fusion and kernel local Fisher discriminant analysis , 2016 .

[24]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[25]  Md. Al Mehedi Hasan,et al.  Protein subcellular localization prediction using multiple kernel learning based support vector machine. , 2017, Molecular bioSystems.

[26]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[27]  Yuhua Li,et al.  Selecting Critical Patterns Based on Local Geometrical and Statistical Information , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.