Predicting multiplex subcellular localization of proteins using protein-protein interaction network: a comparative study

BackgroundProteins that interact in vivo tend to reside within the same or "adjacent" subcellular compartments. This observation provides opportunities to reveal protein subcellular localization in the context of the protein-protein interaction (PPI) network. However, so far, only a few efforts based on heuristic rules have been made in this regard.ResultsWe systematically and quantitatively validate the hypothesis that proteins physically interacting with each other probably share at least one common subcellular localization. With the result, for the first time, four graph-based semi-supervised learning algorithms, Majority, χ2-score, GenMultiCut and FunFlow originally proposed for protein function prediction, are introduced to assign "multiplex localization" to proteins. We analyze these approaches by performing a large-scale cross validation on a Saccharomyces cerevisiae proteome compiled from BioGRID and comparing their predictions for 22 protein subcellular localizations. Furthermore, we build an ensemble classifier to associate 529 unlabeled and 137 ambiguously-annotated proteins with subcellular localizations, most of which have been verified in the previous experimental studies.ConclusionsPhysical interaction of proteins has actually provided an essential clue for their co-localization. Compared to the local approaches, the global algorithms consistently achieve a superior performance.

[1]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[2]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[3]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[4]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[5]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[6]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[7]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[8]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[9]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[10]  Mark A. Ragan,et al.  BMC Systems Biology BioMed Central Research article Protein-protein interaction as a predictor of subcellular location , 2008 .

[11]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[12]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[13]  Gaurav Kumar,et al.  Network analysis of human protein location , 2010, BMC Bioinformatics.

[14]  K. N. Chandrika,et al.  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets , 2006, Nature Genetics.

[15]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[16]  M. Moran,et al.  Large-scale mapping of human protein–protein interactions by mass spectrometry , 2007, Molecular systems biology.

[17]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[18]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[19]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[20]  Trey Ideker,et al.  Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species , 2008, Nucleic acids research.

[21]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[22]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[23]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[25]  Y. Hiraoka,et al.  ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe , 2006, Nature Biotechnology.

[26]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.