Experimental Peptide Identification Repository (EPIR)

LC MS/MS has become an established technology in proteomic studies, and with the maturation of the technology the bottleneck has shifted from data generation to data validation and mining. To address this bottleneck we developed Experimental Peptide Identification Repository (EPIR), which is an integrated software platform for storage, validation, and mining of LC MS/MS-derived peptide evidence. EPIR is a cumulative data repository where precursor ions are linked to peptide assignments and protein associations returned by a search engine (e.g. Mascot, Sequest, or PepSea). Any number of datasets can be parsed into EPIR and subsequently validated and mined using a set of software modules that overlay the database. These include a peptide validation module, a protein grouping module, a generic module for extracting quantitative data, a comparative module, and additional modules for extracting statistical information. In the present study, the utility of EPIR and associated software tools is demonstrated on LC MS/MS data derived from a set of model proteins and complex protein mixtures derived from MCF-7 breast cancer cells. Emphasis is placed on the key strengths of EPIR, including the ability to validate and mine multiple combined datasets, and presentation of protein-level evidence in concise, nonredundant protein groups that are based on shared peptide evidence.

[1]  M. K. Young,et al.  Method for screening peptide fragment ion mass spectra prior to database searching , 2000, Journal of the American Society for Mass Spectrometry.

[2]  T. Köcher,et al.  Preprocessing of tandem mass spectrometric data to support automatic protein identification , 2003, Proteomics.

[3]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  Jacques Colinge,et al.  Improved peptide charge state assignment , 2003, Proteomics.

[6]  Matthias Mann,et al.  HysTag—A Novel Proteomic Quantification Tool Applied to Differential Display Analysis of Membrane Proteins From Distinct Areas of Mouse Brain* , 2004, Molecular & Cellular Proteomics.

[7]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[8]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[9]  S. Patterson Data analysis—the Achilles heel of proteomics , 2003, Nature Biotechnology.

[10]  Eugene A. Kapp,et al.  Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation. , 2003, Analytical chemistry.

[11]  S. Gygi,et al.  Quantitative analysis of complex protein mixtures using isotope-coded affinity tags , 1999, Nature Biotechnology.

[12]  Rovshan G Sadygov,et al.  Code developments to improve the efficiency of automated MS/MS spectra interpretation. , 2002, Journal of proteome research.

[13]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[14]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[15]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[16]  D. Liebler,et al.  Peptide sequence motif analysis of tandem MS data with the SALSA algorithm. , 2002, Analytical chemistry.

[17]  F. Tani,et al.  Temperature control for kinetic refolding of heat‐denatured ovalbumin , 1997, Protein science : a publication of the Protein Society.

[18]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[19]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[20]  J. Yates,et al.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. , 2003, Analytical chemistry.