Comparative bioinformatic analysis of complete proteomes and protein parameters for cross‐species identification in proteomics

Peptide mass fingerprinting (PMF) remains the most amenable technique for protein identification in proteomics, using mass spectrometry as the primary analytical technique coupled with bioinformatics. This relies on the presence of the amino acid sequence of the protein in the current databanks. Despite this, it is desirable to be able to use the technique for organisms whose genomes are not yet fully sequenced and apply cross‐species protein identification. In this study, we have re‐examined the feasibility of such approaches by considering the extent of protein similarity between genome sequences using a data set of 29 complete bacterial and two eukaryotic genomes. A range of protein and peptide features are considered, including protein isoelectric focussing point, protein mass, and amino acid conservation. The effectiveness of PMF approaches has then been tested with a series of computer simulations with varying peptide number and mass accuracy for several cross‐species tests. The results show that PMF alone is unsuitable in general for divergent species jumps, or when protein similarity is less than 70% identity. Despite this, there exists a considerable enrichment above random of tryptic peptide conservation and PMF promises to remain useful when combined with other data than just peptide masses for cross‐species protein identification.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[3]  P. Højrup,et al.  Use of mass spectrometric molecular weight information to identify proteins in sequence databases. , 1993, Biological mass spectrometry.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[6]  M R Wilkins,et al.  Cross‐species identification of proteins separated by two‐dimensional gel electrophoresis using matrix‐assisted laser desorption ionisation/time‐of‐flight mass spectrometry and amino acid composition , 1995, Electrophoresis.

[7]  A. Bleasby,et al.  Chemistry, Mass Spectrometry and Peptide-Mass Databases: Evolution of Methods for the Rapid Identification and Mapping of Cellular Proteins , 1996 .

[8]  M. Wilm,et al.  Analytical properties of the nanoelectrospray ion source. , 1996, Analytical chemistry.

[9]  S J Cordwell,et al.  Conserved motifs as the basis for recognition of homologous proteins across species boundaries using peptide-mass fingerprinting. , 1997, Journal of mass spectrometry : JMS.

[10]  A. Podtelejnikov,et al.  Identification of the components of simple protein mixtures by high-accuracy peptide mass mapping and database searching. , 1997, Analytical chemistry.

[11]  S. Cordwell,et al.  Evaluation of algorithms used for cross‐species proteome characterisation , 1997, Electrophoresis.

[12]  M. Wilkins,et al.  Cross-species protein identification using amino acid composition, peptide mass fingerprinting, isoelectric point and molecular mass: a theoretical evaluation. , 1997, Journal of theoretical biology.

[13]  J R Yates,et al.  Database searching using mass spectrometry data , 1998, Electrophoresis.

[14]  A Bairoch,et al.  Multiple parameter cross‐species protein identification using MultiIdent ‐ a world‐wide web accessible tool , 1998, Electrophoresis.

[15]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[16]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[17]  I. Humphery-Smith,et al.  Cross‐species characterisation of abundantly expressed Ochrobactrum anthropi gene products , 1999, Electrophoresis.

[18]  P Berndt,et al.  Reliable automatic protein identification from matrix‐assisted laser desorption/ionization mass spectrometric peptide fingerprints , 1999, Electrophoresis.

[19]  S. Gygi,et al.  Mass spectrometry and proteomics. , 2000, Current opinion in chemical biology.

[20]  P. Hains,et al.  Cross‐matching marsupial proteins with eutherian mammal databases: Proteome analysis of cells from UV‐induced skin tumours of an opossum (Monodelphis domestica) , 2000, Electrophoresis.

[21]  Matthias Mann,et al.  Functional genomics by mass spectrometry , 2000, FEBS letters.

[22]  A. Shevchenko,et al.  MALDI quadrupole time-of-flight mass spectrometry: a powerful tool for proteomic research. , 2000, Analytical chemistry.

[23]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[24]  B. Chait,et al.  Rapidly switchable matrix-assisted laser desorption/ionization and electrospray quadrupole-time-of-flight mass spectrometry for protein identification , 2000, Journal of the American Society for Mass Spectrometry.

[25]  H. Langen,et al.  Mass spectrometry: A tool for the identification of proteins separated by gels , 2000, Electrophoresis.

[26]  K. Gevaert,et al.  Protein identification methods in proteomics , 2000, Electrophoresis.

[27]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[28]  J R Maddock,et al.  Two‐dimensional electrophoresis and peptide mass fingerprinting of bacterial outer membrane proteins , 2001, Electrophoresis.

[29]  K. Gevaert,et al.  Protein identification based on matrix assisted laser desorption/ionization‐post source decay‐mass spectrometry , 2001, Electrophoresis.

[30]  J. Weinman,et al.  Characterisation of rice anther proteins expressed at the young microspore stage , 2001, Proteomics.

[31]  Olaf Wolkenhauer,et al.  Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome database searching , 2001, Proteomics.

[32]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[33]  A. Sickmann,et al.  Identification of post‐translationally modified proteins in proteome studies , 2001, Electrophoresis.

[34]  S. Manon,et al.  Analysis of protein sequences and protein complexes by matrix‐assisted laser desorption/ionization mass spectrometry , 2001, Proteomics.

[35]  David Fenyö,et al.  A model of random mass‐matching and its use for automated significance testing in mass spectrometric proteome analysis , 2002, Proteomics.