Whole proteome pI values correlate with subcellular localizations of proteins for organisms within the three domains of life.

Isoelectric point (pI) values have long been a standard measure for distinguishing between proteins. This article analyzes distributions of pI values estimated computationally for all predicted ORFs in a selection of fully sequenced genomes. Histograms of pI values confirm the bimodality that has been observed previously for bacterial and archaeal genomes () and reveal a trimodality in eukaryotic genomes. A similar analysis on subsets of a nonredundant protein sequence database generated from the full database by selecting on subcellular localization shows that sequences annotated as corresponding to cytosolic and integral membrane proteins have pI distributions that appear to correspond with the two observed modes of bacteria and archaea. Furthermore, nuclear proteins have a broader distribution that may account for the third mode observed in eukaryotes. On the basis of this association between pI and subcellular localization, we conclude that the bimodal character of whole proteome pI values in bacteria and archaea and the trimodal character in eukaryotes are likely to be general properties of proteomes and are associated with the need for different pI values depending on subcellular localization. Our analyses also suggest that the proportions of proteomes consisting of membrane-associated proteins may be currently underestimated.

[1]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[2]  T. Arakawa,et al.  Theory of protein solubility. , 1985, Methods in enzymology.

[3]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[4]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[5]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[6]  Bacterial evolution , 1987 .

[7]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8]  Ron D. Appel,et al.  The SWISS-2DPAGE database: what has changed during the last year , 1999, Nucleic Acids Res..

[9]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[10]  H. Mewes,et al.  Protein structural classes in five complete genomes , 1997, Nature Structural Biology.

[11]  Ron D. Appel,et al.  Current status of the SWISS-2DPAGE database , 1998, Nucleic Acids Res..

[12]  G. Likhtenshtein Biomembranes. , 1992, Biochimica et biophysica acta.

[13]  Robert B. Gennis,et al.  Biomembranes: Molecular Structure and Function , 1988 .

[14]  Ron D. Appel,et al.  The 1999 SWISS-2DPAGE database update , 2000, Nucleic Acids Res..

[15]  J M Ribeiro,et al.  Isoelectric points of proteins: theoretical determination. , 1989, Analytical biochemistry.

[16]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[17]  Sayaka,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[18]  S. Salzberg,et al.  Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima , 1999, Nature.

[19]  F. Neidhardt,et al.  Diagnosis of cellular states of microbial organisms using proteomics , 1999, Electrophoresis.

[20]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[21]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[22]  M Kanehisa,et al.  Tandem clusters of membrane proteins in complete genome sequences. , 2000, Genome research.