Kernel principal component analysis based on semi-supervised dimensionality reduction and its application on protein subnuclear localization

Kernel parameter of kernel principal component analysis (KPCA) has a great effect on the extraction of useful information from high dimensional and nonlinear protein data. If value of it is set unreasonably, the dimension-reduced data are insufficient for discrimination. Based on this point, a new method is proposed to search the optimal window width parameter in gaussian kernel by introducing the idea of semi-supervised learning in this paper. We firstly employed the particle swarm optimization (PSO) algorithm to search the optimal interval of kernel parameter through a new discriminant criterion. Then the traversing method was applied to search the optimal parameter in the obtained interval. To verify the feasibility of the proposed approach, which is named as KPCA based on semi-supervised dimensionality reduction, numerical experiments were conducted on a public dataset to predict protein subnuclear location with the classifier of k-nearest neighbors (KNN). The final results by Jackknife test prove that our method is efficient and significative.

[1]  Bogdan Raducanu,et al.  Embedding new observations via sparse-coding for non-linear manifold learning , 2014, Pattern Recognit..

[2]  Shunfang Wang,et al.  Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA , 2015, International journal of molecular sciences.

[3]  Gernot R. Müller-Putz,et al.  Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier , 2016, Biomedizinische Technik. Biomedical engineering.

[4]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[5]  Nan Li,et al.  Ensemble Kernel Principal Component Analysis for Improved Nonlinear Process Monitoring , 2015 .

[6]  Bruce A. Draper,et al.  Recognizing faces with PCA and ICA , 2003, Comput. Vis. Image Underst..

[7]  Francesco Palmieri,et al.  Developing a trust model for pervasive computing based on Apriori association rules learning and Bayesian classification , 2016, Soft Computing.

[8]  Hadi Seyedarabi,et al.  Face Recognition Using Gabor Filter Bank, Kernel Principle Component Analysis and Support Vector Machine , 2012 .

[9]  Menglong Li,et al.  Using Position Specific Scoring Matrix and Auto Covariance to Predict Protein Subnuclear Localization , 2009 .

[10]  Le Song,et al.  Distributed Kernel Principal Component Analysis , 2015, ArXiv.

[11]  Tianfu Wang,et al.  FR-KECA: Fuzzy robust kernel entropy component analysis , 2015, Neurocomputing.

[12]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Robert H. Riffenburgh,et al.  Linear Discriminant Analysis , 1960 .

[14]  Ying Ju,et al.  Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier , 2016, Scientific Reports.

[15]  L. A. Rusinov,et al.  Online diagnostics of time‐varying nonlinear chemical processes using moving window kernel principal component analysis and Fisher discriminant analysis , 2017 .

[16]  Bandana Kumari,et al.  Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information , 2014, PloS one.

[17]  Hassani Messaoud,et al.  New fault detection method based on reduced kernel principal component analysis (RKPCA) , 2016 .

[18]  Zhong Jin,et al.  A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection , 2014, Soft Computing.

[19]  D. Zhang,et al.  Principle Component Analysis , 2004 .

[20]  P. Sukumar,et al.  Computer Aided Detection of Cervical Cancer Using Pap Smear Images Based on Adaptive Neuro Fuzzy Inference System Classifier , 2016 .

[21]  Chun-Ting Zhang,et al.  A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. , 2002, The international journal of biochemistry & cell biology.

[22]  E. S. Gopi,et al.  Medical Data Classifications Using Genetic Algorithm Based Generalized Kernel Linear Discriminant Analysis , 2015 .