Refining Protein Subcellular Localization

The study of protein subcellular localization is important to elucidate protein function. Even in well-studied organisms such as yeast, experimental methods have not been able to provide a full coverage of localization. The development of bioinformatic predictors of localization can bridge this gap. We have created a Bayesian network predictor called PSLT2 that considers diverse protein characteristics, including the combinatorial presence of InterPro motifs and protein interaction data. We compared the localization predictions of PSLT2 to high-throughput experimental localization datasets. Disagreements between these methods generally involve proteins that transit through or reside in the secretory pathway. We used our multi-compartmental predictions to refine the localization annotations of yeast proteins primarily by distinguishing between soluble lumenal proteins and soluble proteins peripherally associated with organelles. To our knowledge, this is the first tool to provide this functionality. We used these sub-compartmental predictions to characterize cellular processes on an organellar scale. The integration of diverse protein characteristics and protein interaction data in an appropriate setting can lead to high-quality detailed localization annotations for whole proteomes. This type of resource is instrumental in developing models of whole organelles that provide insight into the extent of interaction and communication between organelles and help define organellar functionality.

[1]  Trevor Lithgow,et al.  A Complete Set of SNAREs in Yeast , 2004, Traffic.

[2]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[3]  K. Chintapalli,et al.  Correlation of anthropometry with CT in Mexican-American women. , 1999, Research in nursing & health.

[4]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[5]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Bengt Vessby,et al.  Sagittal abdominal diameter is a strong anthropometric marker of insulin resistance and hyperproinsulinemia in obese men. , 2004, Diabetes care.

[8]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[9]  L. Berglund,et al.  Sagittal abdominal diameter compared with other anthropometric measurements in relation to cardiovascular risk , 2000, International Journal of Obesity.

[10]  L. Berglund,et al.  Risk factors for coronary heart disease among immigrant women from Iran and Turkey, compared to women of Swedish ethnicity. , 2005, Ethnicity & disease.

[11]  P. Björntorp,et al.  The Influence of Body Fat Distribution on the Incidence of Diabetes Mellitus: 13.5 Years of Follow-up of the Participants in the Study of Men Born in 1913 , 1985, Diabetes.

[12]  A. Barnett,et al.  Association of simple anthropometric measures of obesity with visceral fat and the metabolic syndrome in male Caucasian and Indo‐Asian subjects , 2004, Diabetic medicine : a journal of the British Diabetic Association.

[13]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[14]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[15]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[16]  K. Sung,et al.  C-reactive protein concentrations are related to insulin resistance and metabolic syndrome as defined by the ATP III report. , 2004, International journal of cardiology.

[17]  L. Seigler,et al.  Separation of serum high-density lipoprotein for cholesterol determination: ultracentrifugation vs precipitation with sodium phosphotungstate and magnesium chloride. , 1981, Clinical chemistry.

[18]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[19]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.

[20]  R. DeFronzo,et al.  Insulin Secretion and Action in Subjects With Impaired Fasting Glucose and Impaired Glucose Tolerance , 2006, Diabetes.

[21]  X. Jouven,et al.  Sagittal Abdominal Diameter and Risk of Sudden Death in Asymptomatic Middle-Aged Men: The Paris Prospective Study I , 2004, Circulation.

[22]  G. Berenson,et al.  Relation of abdominal height to cardiovascular risk factors in young adults: the Bogalusa heart study. , 2000, American journal of epidemiology.

[23]  J. Sorkin,et al.  The sagittal waist diameter and mortality in men: the Baltimore Longitudinal Study on Aging. , 1994, International journal of obesity and related metabolic disorders : journal of the International Association for the Study of Obesity.

[24]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[25]  R. Turner,et al.  Homeostasis model assessment: insulin resistance and β-cell function from fasting plasma glucose and insulin concentrations in man , 1985, Diabetologia.

[26]  R. Yu,et al.  Relationship of obesity and visceral adiposity with serum concentrations of CRP, TNF-alpha and IL-6. , 2005, Diabetes research and clinical practice.

[27]  S. B. Pedersen,et al.  Associations between different anthropometric measurements of fatness and metabolic risk parameters in non-obese, healthy, middle-aged men. , 1995, International journal of obesity and related metabolic disorders : journal of the International Association for the Study of Obesity.

[28]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[29]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[30]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[31]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[32]  M. Hjelm,et al.  A METHODOLOGICAL STUDY OF THE ENZYMATIC DETERMINATION OF GLUCOSE IN BLOOD. , 1963, Scandinavian journal of clinical and laboratory investigation.

[33]  A. Döring,et al.  C-Reactive protein, a sensitive marker of inflammation, predicts future risk of coronary heart disease in initially healthy middle-aged men: results from the MONICA (Monitoring Trends and Determinants in Cardiovascular Disease) Augsburg Cohort Study, 1984 to 1992. , 1999, Circulation.

[34]  L. Berglund,et al.  Reliability of anthropometric measurements in overweight and lean subjects: consequences for correlations between anthropometric and other variables , 2000, International Journal of Obesity.

[35]  R. Levy,et al.  Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. , 1972, Clinical chemistry.

[36]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[37]  Wolfgang Huber,et al.  From ORFeome to biology: a functional genomics pipeline. , 2004, Genome research.

[38]  M. Gerstein,et al.  A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. , 2000, Journal of molecular biology.

[39]  C. Bouchard,et al.  Swedish obese subjects (SOS). Recruitment for an intervention study and a selected description of the obese state. , 1992, International journal of obesity and related metabolic disorders : journal of the International Association for the Study of Obesity.

[40]  T. Quertermous,et al.  Sagittal abdominal diameter is associated with insulin sensitivity in Chinese hypertensive patients and their siblings , 2003, Journal of Human Hypertension.

[41]  Zhiyong Lu,et al.  Predicting subcellular localization of proteins using machine-learned classifiers , 2004, Bioinform..

[42]  D. Eisenberg,et al.  Localizing proteins in the cell from their phylogenetic profiles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Michael T. Hallett,et al.  The Hera database and its use in the characterization of endoplasmic reticulum proteins , 2004, Bioinform..

[44]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[45]  G. Kozlov,et al.  Specific interaction of ERp57 and calnexin determined by NMR spectroscopy and an ER two‐hybrid system , 2004, The EMBO journal.

[46]  G. Shore,et al.  Regulation of apoptosis by endoplasmic reticulum pathways , 2003, Oncogene.

[47]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[48]  A. Zivelonghi,et al.  Sagittal abdominal diameter as a practical predictor of visceral fat , 1998, International Journal of Obesity.

[49]  E. Fracassi,et al.  Waist circumference and abdominal sagittal diameter as surrogates of body fat distribution in the elderly: their relation with cardiovascular risk factors , 2000, International Journal of Obesity.

[50]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[51]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[52]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[53]  P. Björntorp,et al.  Abdominal adipose tissue distribution, obesity, and risk of cardiovascular disease and death: 13 year follow up of participants in the study of men born in 1913. , 1984, British medical journal.

[54]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[55]  W. Bickmore,et al.  Large-scale identification of mammalian proteins localized to nuclear sub-compartments. , 2001, Human molecular genetics.

[56]  H Austin,et al.  Simple anthropometric indices associated with ischemic heart disease. , 1996, Journal of clinical epidemiology.

[57]  S. Johansson,et al.  Diabetes mellitus in Turkish immigrants in Sweden. , 2003, Diabetes & metabolism.

[58]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Trisha N Davis,et al.  Protein localization in proteomics. , 2004, Current opinion in chemical biology.

[60]  E. Choi,et al.  Association of C-reactive protein with the metabolic risk factors among young and middle-aged Koreans. , 2006, Metabolism: clinical and experimental.

[61]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[62]  L H Kuller,et al.  Relationship of C-reactive protein to risk of cardiovascular disease in the elderly. Results from the Cardiovascular Health Study and the Rural Health Promotion Project. , 1997, Arteriosclerosis, thrombosis, and vascular biology.

[63]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[64]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[65]  Igor Stagljar,et al.  The split-ubiquitin membrane-based yeast two-hybrid system. , 2004, Methods in molecular biology.