Analysis of large-scale MS data sets: the dramas and the delights

Abstract The biotechnology and pharmaceutical industries are faced with the serious challenge of consolidating the enormous quantities of data that have been generated from high-throughput proteomic applications. The bottleneck of data validation and placement of the information obtained into sound biological context urgently needs to be addressed. Here, we review the issues that arise when analysing large quantities of data generated by liquid chromatography mass spectrometry, offer potential solutions for data management and predict the future direction of large-scale data analysis by mass spectrometry.

[1]  Blagoy Blagoev,et al.  A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling , 2003, Nature Biotechnology.

[2]  Eugene A. Kapp,et al.  CHOMPER: A bioinformatic tool for rapid validation of tandem mass spectrometry search results associated with high‐throughput proteomic strategies , 2002, Proteomics.

[3]  J. Shabanowitz,et al.  Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae , 2002, Nature Biotechnology.

[4]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[5]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[6]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[7]  Ronald J Moore,et al.  Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Weimin Zhu,et al.  Processing of serum proteins underlies the mass spectral fingerprinting of myocardial infarction. , 2003, Journal of proteome research.

[9]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[10]  J. Yates,et al.  Probability-based validation of protein identifications using a modified SEQUEST algorithm. , 2002, Analytical chemistry.

[11]  M. Mann,et al.  Directed Proteomic Analysis of the Human Nucleolus , 2002, Current Biology.

[12]  M. Mann,et al.  Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Caprioli,et al.  Molecular imaging of biological samples: localization of peptides and proteins using MALDI-TOF MS. , 1997, Analytical chemistry.

[14]  Pierre Legrain,et al.  Comprehensive Proteomic Analysis of Breast Cancer Cell Membranes Reveals Unique Proteins with Potential Roles in Clinical Cancer* 210 , 2003, The Journal of Biological Chemistry.

[15]  Neil Hall,et al.  Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry , 2002, Nature.

[16]  J. Leszyk,et al.  Proteomic Analysis of a Detergent-resistant Membrane Skeleton from Neutrophil Plasma Membranes* 210 , 2002, The Journal of Biological Chemistry.

[17]  Marjan S. Bolouri,et al.  Integrated Analysis of Protein Composition, Tissue Diversity, and Gene Regulation in Mouse Mitochondria , 2003, Cell.

[18]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[19]  S. Gygi,et al.  Quantitative analysis of complex protein mixtures using isotope-coded affinity tags , 1999, Nature Biotechnology.

[20]  Juri Rappsilber,et al.  Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex , 1998, Nature Genetics.

[21]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[22]  Matthias Mann,et al.  HysTag—A Novel Proteomic Quantification Tool Applied to Differential Display Analysis of Membrane Proteins From Distinct Areas of Mouse Brain* , 2004, Molecular & Cellular Proteomics.

[23]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[24]  X. Yao,et al.  Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. , 2001, Analytical chemistry.

[25]  Thomas P Conrads,et al.  A detergent‐ and cyanogen bromide‐free method for integral membrane proteomics: Application to Halobacterium purple membranes and the human epidermal membrane proteome , 2004, Proteomics.

[26]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[27]  R. Aebersold,et al.  Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry , 2001, Nature Biotechnology.

[28]  A. Podtelejnikov,et al.  Screening for N‐glycosylated proteins by liquid chromatography mass spectrometry , 2004, Proteomics.

[29]  Bradford W. Gibson,et al.  Characterization of the human heart mitochondrial proteome , 2003, Nature Biotechnology.

[30]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[31]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[32]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[33]  Bill C. White,et al.  Proteomic patterns of tumour subsets in non-small-cell lung cancer , 2003, The Lancet.

[34]  T. Veenstra,et al.  Quantitative analysis of bacterial and mammalian proteomes using a combination of cysteine affinity tags and 15N-metabolic labeling. , 2001, Analytical chemistry.

[35]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[36]  David L. Tabb,et al.  A proteomic view of the Plasmodium falciparum life cycle , 2002, Nature.

[37]  S. Patterson Data analysis—the Achilles heel of proteomics , 2003, Nature Biotechnology.