PREDICTING SUBCHLOROPLAST LOCATIONS OF PROTEINS BASED ON THE GENERAL FORM OF CHOU'S PSEUDO AMINO ACID COMPOSITION: APPROACHED FROM OPTIMAL TRIPEPTIDE COMPOSITION

Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it is highly desired to develop a computational method to predict the subchloroplast locations of chloroplast proteins. In this study, we proposed a novel method to predict subchloroplast locations of proteins using tripeptide compositions. It first used the binomial distribution to optimize the feature sets. Then the support vector machine was selected to perform the prediction of subchloroplast locations of proteins. The proposed method was tested on a reliable and rigorous dataset including 259 chloroplast proteins with sequence identity ≤ 25%. In the jack-knife cross-validation, 92.21% envelope proteins, 93.20% thylakoid membrane, 52.63% thylakoid lumen and 85.00% stroma can be correctly identified. The overall accuracy achieves 88.03% which is higher than that of other models. Based on this method, a predictor called ChloPred has been built and can be freely available from http://cobi.uestc.edu.cn/people/hlin/tools/ChloPred/. The predictor will provide important information for theoretical and experimental research of chloroplast proteins.

[1]  Dinesh Gupta,et al.  Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[3]  Suyu Mei,et al.  Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. , 2012, Journal of theoretical biology.

[4]  Jing Hu,et al.  BS-KNN: An Effective Algorithm for Predicting Protein Subchloroplast Localization , 2012, Evolutionary bioinformatics online.

[5]  Le-Le Hu,et al.  PSCL: predicting protein subcellular localization based on optimal functional domains. , 2012, Protein and peptide letters.

[6]  K. Chou,et al.  iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins. , 2012, Protein and peptide letters.

[7]  Yan Wang,et al.  Using a novel AdaBoost algorithm and Chou's Pseudo amino acid composition for predicting protein subcellular localization. , 2011, Protein and peptide letters.

[8]  Bo Liao,et al.  Predicting apoptosis protein subcellular location with PseAAC by incorporating tripeptide composition. , 2011, Protein and peptide letters.

[9]  Kuo-Chen Chou,et al.  Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property , 2011, PloS one.

[10]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[11]  Shao-Ping Shi,et al.  OligoPred: a web-server for predicting homo-oligomeric proteins by incorporating discrete wavelet transform into Chou's pseudo amino acid composition. , 2011, Journal of molecular graphics & modelling.

[12]  Li Guo,et al.  Compressed learning and its applications to subcellular localization. , 2011, Protein and peptide letters.

[13]  Hong-Bin Shen,et al.  Conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifier. , 2011, Current protein & peptide science.

[14]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[15]  Jianxiu Guo,et al.  Predicting protein folding rates using the concept of Chou's pseudo amino acid composition , 2011, Journal of computational chemistry.

[16]  Cunshuan Xu,et al.  Prediction of rat protein subcellular localization with pseudo amino acid composition based on multiple sequential features. , 2011, Protein and peptide letters.

[17]  Lele Hu,et al.  Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features. , 2011, Protein and peptide letters.

[18]  Xiaoqi Zheng,et al.  Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach. , 2011, Protein and peptide letters.

[19]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[20]  Li Zhang,et al.  Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction. , 2011, Biochimica et biophysica acta.

[21]  Dongsheng Zou,et al.  Supersecondary structure prediction using Chou's pseudo amino acid composition , 2011, J. Comput. Chem..

[22]  David A Winkler,et al.  Tripeptide motifs in biology: targets for peptidomimetic design. , 2011, Journal of medicinal chemistry.

[23]  B. Moshiri,et al.  Prediction of protein submitochondria locations based on data fusion of various features of sequences. , 2011, Journal of theoretical biology.

[24]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[25]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[26]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[27]  Hui Ding,et al.  Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. , 2011, Protein and peptide letters.

[28]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[29]  Thomas Martinetz,et al.  Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. , 2010, Protein and peptide letters.

[30]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[31]  Hong Gu,et al.  A novel method for predicting protein subcellular localization based on pseudo amino acid composition. , 2010, BMB reports.

[32]  Nai-Yang Deng,et al.  Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. , 2010, Protein and peptide letters.

[33]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[34]  Xiaoqi Zheng,et al.  Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation. , 2010, Protein and peptide letters.

[35]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[36]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[37]  Jianding Qiu,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[38]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[39]  Shinn-Ying Ho,et al.  Prediction of Protein Subchloroplast Locations using Random Forests , 2010 .

[40]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[41]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[42]  Shao-Ping Shi,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[43]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[44]  Kuo-Chen Chou,et al.  Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of Gram-positive bacterial proteins. , 2009, Protein and peptide letters.

[45]  Yanda Li,et al.  SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. , 2009, Journal of theoretical biology.

[46]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[47]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[48]  Li Zhang,et al.  A novel representation for apoptosis protein subcellular localization prediction using support vector machine. , 2009, Journal of theoretical biology.

[49]  Jianding Qiu,et al.  Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform. , 2009, Analytical biochemistry.

[50]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[51]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[52]  Hao Lin,et al.  Prediction of Subcellular Localization of Apoptosis Protein Using Chou’s Pseudo Amino Acid Composition , 2009, Acta biotheoretica.

[53]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[54]  Zong Dai,et al.  Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis , 2009, Amino Acids.

[55]  Yongsheng Ding,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier , 2008, Pattern Recognit. Lett..

[56]  Guangya Zhang,et al.  Predicting lipase types by improved Chou's pseudo-amino acid composition. , 2008, Protein and peptide letters.

[57]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[58]  Guangya Zhang,et al.  Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo-amino acid composition. , 2008, Journal of theoretical biology.

[59]  Fengmin Li,et al.  Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. , 2008, Protein and peptide letters.

[60]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[61]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[62]  Xiaoying Jiang,et al.  Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. , 2008, Protein and peptide letters.

[63]  Liaofu Luo,et al.  Use of  tetrapeptide signals for protein secondary-structure prediction , 2008, Amino Acids.

[64]  Shao-Wu Zhang,et al.  Using Chou’s pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach , 2008, Amino Acids.

[65]  Bing Niu,et al.  Predicting subcellular localization with AdaBoost Learner. , 2008, Protein and peptide letters.

[66]  Tongliang Zhang,et al.  Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location , 2008, Amino Acids.

[67]  Shao-Wu Zhang,et al.  Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies , 2008, Amino Acids.

[68]  Loris Nanni,et al.  Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization , 2008, Amino Acids.

[69]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[70]  Ying-Li Chen,et al.  Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. , 2007, Journal of theoretical biology.

[71]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[72]  Ying-Li Chen,et al.  Prediction of the subcellular location of apoptosis proteins. , 2007, Journal of theoretical biology.

[73]  Yanzhi Guo,et al.  Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features , 2007, Amino Acids.

[74]  Jenn-Kang Hwang,et al.  Prediction of protein subcellular localization , 2006, Proteins.

[75]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[76]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[77]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[78]  Sharmila Anishetty,et al.  Tripeptide analysis of protein structures , 2002, BMC Structural Biology.

[79]  Burkhard Rost,et al.  Sequence conserved for subcellular localization , 2002, Protein science : a publication of the Protein Society.

[80]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[81]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[82]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[83]  Gilbert D. Brum,et al.  Biology: Exploring Life , 1989 .