PepSeeker: a database of proteome peptide identifications for investigating fragmentation patterns

Proteome science relies on bioinformatics tools to characterize proteins via their proteolytic peptides which are identified via characteristic mass spectra generated after their ions undergo fragmentation in the gas phase within the mass spectrometer. The resulting secondary ion mass spectra are compared with protein sequence databases in order to identify the amino acid sequence. Although these search tools (e.g. SEQUEST, Mascot, X!Tandem, Phenyx) are frequently successful, much is still not understood about the amino acid sequence patterns which promote/protect particular fragmentation pathways, and hence lead to the presence/absence of particular ions from different ion series. In order to advance this area, we have developed a database, PepSeeker (), which captures this peptide identification and ion information from proteome experiments. The database currently contains >185 000 peptides and associated database search information. Users may query this resource to retrieve peptide, protein and spectral information based on protein or peptide information, including the amino acid sequence itself represented by regular expressions coupled with ion series information. We believe this database will be useful to proteome researchers wishing to understand gas phase peptide ion chemistry in order to improve peptide identification strategies. Questions can be addressed to j.selley@manchester.ac.uk.

[1]  Rolf Apweiler,et al.  The HUPO Proteomics Standards Initiative Meeting: Towards Common Standards for Exchanging Proteomics Data , 2003, Comparative and functional genomics.

[2]  Alistair J. P. Brown,et al.  PEDRo: A database for storing, searching and disseminating experimental proteomics data , 2004, BMC Genomics.

[3]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[4]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[5]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[6]  John R Yates,et al.  Influence of basic residue content on fragment ion peak intensities in low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[7]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[8]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[9]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[10]  Boris Zybailov,et al.  Principles and applications of Multidimensional Protein Identification Technology , 2004, Expert review of proteomics.

[11]  References , 1971 .

[12]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[13]  Rong Wang,et al.  The need for a public proteomics repository , 2004, Nature Biotechnology.

[14]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[15]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[16]  John R Yates,et al.  Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra. , 2003, Analytical chemistry.

[17]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[18]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.