Using protein-protein interaction network information to predict the subcellular locations of proteins in budding yeast.

The information of protein subcellular localization is vitally important for in-depth understanding the intricate pathways that regulate biological processes at the cellular level. With the rapidly increasing number of newly found protein sequence in the Post-Genomic Age, many automated methods have been developed attempting to help annotate their subcellular locations in a timely manner. However, very few of them were developed using the protein-protein interaction (PPI) network information. In this paper, we have introduced a new concept called "tethering potential" by which the PPI information can be effectively fused into the formulation for protein samples. Based on such a network frame, a new predictor called Yeast-PLoc has been developed for identifying budding yeast proteins among their 19 subcellular location sites. Meanwhile, a purely sequence-based approach, called the "hybrid-property" method, is integrated into Yeast-PLoc as a fall-back to deal with those proteins without sufficient PPI information. The overall success rate by the jackknife test on the 4,683 yeast proteins in the training dataset was 70.25%. Furthermore, it was shown that the success rate by Yeast- PLoc on an independent dataset was remarkably higher than those by some other existing predictors, indicating that the current approach by incorporating the PPI information is quite promising. As a user-friendly web-server, Yeast-PLoc is freely accessible at http://yeastloc.biosino.org/.

[1]  Jianding Qiu,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[2]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[3]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[4]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[5]  Hans-Peter Kriegel,et al.  Supervised Ensembles of Prediction Methods for Subcellular Localization , 2009, APBC.

[6]  Kuo-Chen Chou,et al.  Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties , 2011, PloS one.

[7]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[8]  Chittibabu Guda,et al.  pTARGET: a web server for predicting protein subcellular localization , 2006, Nucleic Acids Res..

[9]  Kuo-Chen Chou,et al.  A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. , 2003, Biochemical and biophysical research communications.

[10]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[11]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[12]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[13]  P. Greengard,et al.  The Arctic Alzheimer mutation favors intracellular amyloid‐β production by making amyloid precursor protein less available to α‐secretase , 2007, Journal of neurochemistry.

[14]  Fengmin Li,et al.  Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. , 2008, Protein and peptide letters.

[15]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[16]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[17]  A V Finkelstein,et al.  The classification and origins of protein folding patterns. , 1990, Annual review of biochemistry.

[18]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[19]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[20]  S. O'toole,et al.  Cytoplasmic Localization of β-Catenin is a Marker of Poor Outcome in Breast Cancer Patients , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[21]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[22]  B. Rost,et al.  Better prediction of sub‐cellular localization by combining evolutionary and structural information , 2003, Proteins.

[23]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[24]  Ying-Li Chen,et al.  Prediction of the subcellular location of apoptosis proteins. , 2007, Journal of theoretical biology.

[25]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[26]  Zong Dai,et al.  Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis , 2009, Amino Acids.

[27]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[28]  R. Murphy,et al.  Automated subcellular location determination and high-throughput microscopy. , 2007, Developmental cell.

[29]  Kuo-Chen Chou,et al.  Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of Gram-positive bacterial proteins. , 2009, Protein and peptide letters.

[30]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[31]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[32]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[33]  Sándor Pongor,et al.  The SBASE protein domain library, Release 4.0: a collection of annotated protein sequence segments , 1993, Nucleic Acids Res..

[34]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[35]  Yanzhi Guo,et al.  Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features , 2007, Amino Acids.

[36]  Roland Eils,et al.  Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains , 2006, BMC Bioinformatics.

[37]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[38]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[39]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[40]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[41]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[42]  Xiaoying Jiang,et al.  Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. , 2008, Protein and peptide letters.

[43]  Li Zhang,et al.  A novel representation for apoptosis protein subcellular localization prediction using support vector machine. , 2009, Journal of theoretical biology.

[44]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[46]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[47]  Jenn-Kang Hwang,et al.  Prediction of protein subcellular localization , 2006, Proteins.

[48]  Pierre Tufféry,et al.  PredAcc: prediction of solvent accessibility , 1999, Bioinform..

[49]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[50]  A. Millar,et al.  Exploring the Function-Location Nexus: Using Multiple Lines of Evidence in Defining the Subcellular Location of Plant Proteins , 2009, The Plant Cell Online.

[51]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[52]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[53]  Mark A. Ragan,et al.  BMC Systems Biology BioMed Central Research article Protein-protein interaction as a predictor of subcellular location , 2008 .

[54]  J. Blake,et al.  Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. , 2002, Genome research.

[55]  Kuo-Chen Chou,et al.  Predicting 22 protein localizations in budding yeast. , 2004, Biochemical and biophysical research communications.

[56]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[57]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[58]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[59]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[60]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[61]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[62]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[63]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[64]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[65]  Ying-Li Chen,et al.  Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. , 2007, Journal of theoretical biology.

[66]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..