Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species

The function of a protein is intimately tied to its subcellular localization. Although localizations have been measured for many yeast proteins through systematic GFP fusions, similar studies in other branches of life are still forthcoming. In the interim, various machine-learning methods have been proposed to predict localization using physical characteristics of a protein, such as amino acid content, hydrophobicity, side-chain mass and domain composition. However, there has been comparatively little work on predicting localization using protein networks. Here, we predict protein localizations by integrating an extensive set of protein physical characteristics over a protein's extended protein–protein interaction neighborhood, using a classification framework called ‘Divide and Conquer k-Nearest Neighbors’ (DC-kNN). These predictions achieve significantly higher accuracy than two well-known methods for predicting protein localization in yeast. Using new GFP imaging experiments, we show that the network-based approach can extend and revise previous annotations made from high-throughput studies. Finally, we show that our approach remains highly predictive in higher eukaryotes such as fly and human, in which most localizations are unknown and the protein network coverage is less substantial.

[1]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[2]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[3]  K.Z. Mao,et al.  Orthogonal forward selection and backward elimination algorithms for feature subset selection , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Peer Bork,et al.  Predicting protein cellular localization using a domain projection method. , 2002, Genome research.

[5]  Sabine Van Huffel,et al.  Bagging Linear Sparse Bayesian Learning Models for Variable Selection in Cancer Diagnosis , 2007, IEEE Transactions on Information Technology in Biomedicine.

[6]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[7]  B. Reiser,et al.  Comparing the Areas Under Two Correlated ROC Curves: Parametric and Non‐Parametric Approaches , 2006, Biometrical journal. Biometrische Zeitschrift.

[8]  Y. Hiraoka,et al.  ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe , 2006, Nature Biotechnology.

[9]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[10]  R. Chanet,et al.  Protein interaction mapping: a Drosophila case study. , 2005, Genome research.

[11]  Margaret Werner-Washburne,et al.  The genomics of yeast responses to environmental stress and starvation , 2002, Functional & Integrative Genomics.

[12]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[13]  K. Chou,et al.  Predicting protein-protein interactions from sequences in a hybridization space. , 2006, Journal of proteome research.

[14]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[15]  D. Thiele,et al.  Oxidative stress induced heat shock factor phosphorylation and HSF-dependent activation of yeast metallothionein gene transcription. , 1996, Genes & development.

[16]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[17]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[18]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[19]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[20]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[21]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[22]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[23]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[24]  H. Nelson,et al.  The Natural Osmolyte Trehalose Is a Positive Regulator of the Heat-Induced Activity of Yeast Heat Shock Transcription Factor , 2006, Molecular and Cellular Biology.

[25]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[26]  Kei-Hoi Cheung,et al.  Large-scale analysis of the yeast genome by transposon tagging and gene disruption , 1999, Nature.

[27]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[28]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[29]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[30]  A. Mendelsohn,et al.  Protein Interaction Methods-Toward an Endgame , 1999, Science.

[31]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[32]  Lukas N. Mueller,et al.  An integrated mass spectrometric and computational framework for the analysis of protein interaction networks , 2007, Nature Biotechnology.

[33]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[34]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[35]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[36]  A. Bauch,et al.  An efficient tandem affinity purification procedure for interaction proteomics in mammalian cells , 2006, Nature Methods.

[37]  T. Hughes,et al.  High-definition macromolecular composition of yeast RNA-processing complexes. , 2004, Molecular cell.

[38]  Michael T. Hallett,et al.  Refining Protein Subcellular Localization , 2005, PLoS Comput. Biol..

[39]  Doheon Lee,et al.  PLPD: reliable protein localization prediction from imbalanced and overlapped datasets , 2006, Nucleic acids research.

[40]  V. Iyer,et al.  Genome-Wide Analysis of the Biology of Stress Responses through Heat Shock Transcription Factor , 2004, Molecular and Cellular Biology.

[41]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[42]  J. Gardy,et al.  Methods for predicting bacterial protein subcellular localization , 2006, Nature Reviews Microbiology.

[43]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[44]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[45]  M. Gerstein,et al.  A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. , 2000, Journal of molecular biology.

[46]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[47]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[48]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[49]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[50]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[51]  Hagit Shatkay,et al.  SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. , 2007, Bioinformatics.

[52]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[53]  D. Garsin,et al.  Insulin Signaling and the Heat Shock Response Modulate Protein Homeostasis in the Caenorhabditis elegans Intestine during Infection* , 2008, Journal of Biological Chemistry.

[54]  David L Streiner,et al.  What's under the ROC? An Introduction to Receiver Operating Characteristics Curves , 2007, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[55]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[56]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.