Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins

One of the central problems in computational biology is protein function identification in an automated fashion. A key step to achieve this is predicting to which subcellular location the protein belongs, since protein localization correlates closely with its function. A wide variety of methods for protein subcellular localization prediction have been proposed over recent years. Linear dimensionality reduction (DR) methods have been introduced to address the high-dimensionality problem by transforming the representation of protein sequences. However, this approach is not suitable for some complex biological systems that have nonlinear characteristics. Herein, we use nonlinear DR methods such as the kernel DR method to capture the nonlinear characteristics of a high-dimensional space. Then, the K-nearest-neighbor (K-NN) classifier is employed to identify the subcellular localization of Gram-negative bacterial proteins based on their reduced low-dimensional features. Experimental results thus obtained are quite encouraging, indicating that the applied nonlinear DR method is effective to deal with this complicated problem of predicting subcellular localization of Gram-negative bacterial proteins. An online web server for predicting subcellular location of Gram-negative bacterial proteins is available at http://202.120.37.185:8080/.

[1]  Da-Peng Li,et al.  Amino Acid Principal Component Analysis (AAPCA) and its Applications in Protein Structural Class Prediction , 2006, Journal of biomolecular structure & dynamics.

[2]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[3]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  Gajendra Pal Singh Raghava,et al.  Prediction of β‐turns in proteins from multiple alignment using neural network , 2003, Protein science : a publication of the Protein Society.

[6]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[7]  G P S Raghava,et al.  A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment. , 2003, Protein science : a publication of the Protein Society.

[8]  Kuo-Chen Chou,et al.  Predicting protein structural class with AdaBoost Learner. , 2006, Protein and peptide letters.

[9]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[10]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[11]  Gajendra P. S. Raghava,et al.  BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques , 2005, Nucleic Acids Res..

[12]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[13]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[14]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[16]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[17]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[18]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[19]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[20]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[21]  Edmund R. Malinowski,et al.  Factor Analysis in Chemistry , 1980 .

[22]  Gajendra P. S. Raghava,et al.  A neural‐network based method for prediction of γ‐turns in proteins from multiple sequence alignment , 2003, Protein science : a publication of the Protein Society.

[23]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[24]  Guo-Zheng Li,et al.  Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins , 2008, Molecular Diversity.

[25]  Thierry Denoeux A k -Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory , 2008, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[26]  Zhi-Ping Feng,et al.  Using amino acid and peptide composition to predict membrane protein types. , 2007, Biochemical and biophysical research communications.

[27]  S. Deming,et al.  Chemometrics: an overview. , 1986, Clinical chemistry.

[28]  K.-C. Chou,et al.  Using string kernel to predict signal peptide cleavage site based on subsite coupling model , 2005, Amino Acids.

[29]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[30]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[31]  Kuo-Chen Chou,et al.  Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[32]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[33]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[34]  H.-B. Shen,et al.  Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments , 2006, Amino Acids.

[35]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[36]  Yongsheng Ding,et al.  Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence , 2006, Comput. Biol. Chem..