Protein identification with N and C-terminal sequence tags in proteome projects.

Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of many such organisms are being studied with two-dimensional (2D) gel electrophoresis. Here we have investigated the application of short N-terminal and C-terminal sequence tags to the identification of proteins separated on 2D gels. The theoretical N and C termini of 15, 519 proteins, representing all SWISS-PROT entries for the organisms Mycoplasma genitalium, Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae and human, were analysed. Sequence tags were found to be surprisingly specific, with N-terminal tags of four amino acid residues found to be unique for between 43% and 83% of proteins, and C-terminal tags of four amino acid residues unique for between 74% and 97% of proteins, depending on the species studied. Sequence tags of five amino acid residues were found to be even more specific. To utilise this specificity of sequence tags for protein identification, we created a world-wide web-accessible protein identification program, TagIdent (http://www.expasy.ch/www/tools.html), which matches sequence tags of up to six amino acid residues as well as estimated protein pI and mass against proteins in the SWISS-PROT database. We demonstrate the utility of this identification approach with sequence tags generated from 91 different E. coli proteins purified by 2D gel electrophoresis. Fifty-one proteins were unambiguously identified by virtue of their sequence tags and estimated pI and mass, and a further 11 proteins identified when sequence tags were combined with protein amino acid composition data. We conlcude that the TagIdent identification approach is best suited to the identification of proteins from prokaryotes whose complete genome sequences are available. The approach is less well suited to proteins from eukaryotes, as many eukaryotic proteins are not amenable to sequencing via Edman degradation, and tag protein identification cannot be unambiguous unless an organism's complete sequence is available.

[1]  D F Hochstrasser,et al.  Methods for increasing the resolution of two-dimensional protein electrophoresis. , 1988, Analytical biochemistry.

[2]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D F Hochstrasser,et al.  Development of polyacrylamide gels that improve the separation of proteins and their detection by silver staining. , 1988, Analytical biochemistry.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  Thierry Rabilloud,et al.  Sample application by in‐gel rehydration improves the resolution of two‐dimensional electrophoresis with immobilized pH gradients in the first dimension , 1994, Electrophoresis.

[6]  R D Appel,et al.  A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. , 1994, Trends in biochemical sciences.

[7]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[8]  K. Parker,et al.  Peptide fingerprints after partial acid hydrolysis: analysis by matrix-assisted laser desorption/ionization mass spectrometry. , 1994, Rapid communications in mass spectrometry : RCM.

[9]  D. Pappin,et al.  Peptide ladder sequencing by mass spectrometry using a novel, volatile degradation reagent. , 1994, Rapid communications in mass spectrometry : RCM.

[10]  B. Thiede,et al.  MALDI‐MS for C‐terminal sequence determination of peptides and proteins degraded by carboxypeptidase Y and P , 1995, FEBS letters.

[11]  F. Regnier,et al.  C-terminal ladder sequencing via matrix-assisted laser desorption mass spectrometry coupled with carboxypeptidase Y time-dependent and concentration-dependent digestions. , 1995, Analytical chemistry.

[12]  M J MacCoss,et al.  Direct database searching with MALDI-PSD spectra of peptides. , 1995, Rapid communications in mass spectrometry : RCM.

[13]  M. Wilkins,et al.  Improved high-performance liquid chromatography of amino acids derivatised with 9-fluorenylmethyl chloroformate , 1996 .

[14]  M R Wilkins,et al.  Large-scale amino-acid analysis for proteome studies. , 1996, Journal of chromatography. A.

[15]  M R Wilkins,et al.  Rapid protein identification using N-terminal "sequence tag" and amino acid analysis. , 1996, Biochemical and biophysical research communications.

[16]  A Bairoch,et al.  Two‐dimensional gel electrophoresis of Escherichia coli homogenates: The Escherichia coli SWISS‐2DPAGE database , 1996, Electrophoresis.

[17]  D. Hochstrasser,et al.  Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. , 1996, Biotechnology & genetic engineering reviews.

[18]  D. Hochstrasser,et al.  From Proteins to Proteomes: Large Scale Protein Identification by Two-Dimensional Electrophoresis and Arnino Acid Analysis , 1996, Bio/Technology.

[19]  D. Hochstrasser,et al.  Characterization of Human Plasma Glycoproteins Separated by Two-Dimensional Gel Electrophoresis , 1996, Bio/Technology.

[20]  C. G. Miller,et al.  Electroblotting of proteins to Teflon tape and membranes for N- and C-terminal sequence analysis. , 1996, Analytical Biochemistry.

[21]  D. Hochstrasser,et al.  A role for Edman degradation in proteome studies , 1997, Electrophoresis.

[22]  Marc R. Wilkins,et al.  Protein Identification in Proteome Projects , 1997 .

[23]  P. James,et al.  Of genomes and proteomes. , 1997, Biochemical and biophysical research communications.

[24]  D. Hochstrasser,et al.  Improved and simplified in‐gel sample application using reswelling of dry immobilized pH gradients , 1997, Electrophoresis.

[25]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..