Multi‐modality of pI distribution in whole proteome

Multi‐modality of pI distribution is a common feature in different whole proteomes. Some researchers considered it relate to the proteins with different subcellular locations, indicating the result of natural selection. We explored the pI distribution of predicted proteomes (including animals, plants, bacterium, archaeans) and random proteome [random protein sequences constructed according to the special amino acid composition and molecular weight (MW) distribution of human predicted proteome]. Our results suggest that the multi‐modality is the result of discrete pKR values for different amino acids. Amino acid composition and MW distribution of a proteome also contributes to the specific pI distribution. Although protein subcellular location was related to pI value, our analyses revealed that comparing with the random proteome, neither the multi‐modality phenomenon nor the distribution bias of pI values is caused by subcellular location. It seems that the multi‐modality distribution is just a mathematical fun. The blank region near the neutral pI was caused by the absence of amino acids with neutral pKR, and suggests that the selection of amino acids with ionizable side chain might be restricted by the requirement for a special pH environment during the origin of life. From this point of view, the special distribution was the result of natural selection.

[1]  D. Hochstrasser,et al.  The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences , 1993, Electrophoresis.

[2]  G. Caraux,et al.  The modal distribution of protein isoelectric points reflects amino acid properties rather than sequence evolution , 2004, Proteomics.

[3]  W. Bickmore,et al.  Large-scale identification of mammalian proteins localized to nuclear sub-compartments. , 2001, Human molecular genetics.

[4]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[5]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[6]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[7]  Wendy A Bickmore,et al.  Addressing protein localization within the nucleus , 2002, The EMBO journal.

[8]  F. Neidhardt,et al.  Diagnosis of cellular states of microbial organisms using proteomics , 1999, Electrophoresis.

[9]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[10]  Paul B Rainey,et al.  Global analysis of predicted proteomes: functional adaptation of physical properties. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  D. Jackson,et al.  A gentle method for preparing cyto- and nucleo-skeletons and associated chromatin. , 1988, Journal of cell science.

[12]  J. Celis,et al.  Reference points for comparisons of two‐dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions , 1994, Electrophoresis.

[13]  R. Schwartz,et al.  Whole proteome pI values correlate with subcellular localizations of proteins for organisms within the three domains of life. , 2001, Genome research.