The Proteome Discovery Pipeline--A Data Analysis Pipeline for Mass Spectrometry-Based Differential P

Proteomics approaches enable interrogation of large numbers of proteins to provide a more comprehensive understanding of biological systems. High throughput proteomics typically utilizes liquid chromatography - mass spectrometry technology for data acquisition. Bioinformatic analysis tools are essential to manage and mine resulting high volume proteomics data sets. Data analysis is a current bottleneck for many proteomics researchers because complete and freely accessible already-developed systems are not available. In addition, most analysis systems require experienced bioinformatician input immediately upon data acquisition. For proteomics to achieve greatest impact in biology, data analysis must be more efficient and effective. We present the Proteome Discovery Pipeline (PDP), a web-based analysis platform that provides proteomics data analysis without requirement for specialized hardware or input from bioinformatics specialists for initial data analyses. Function- alities of the PDP include spectrum visualization, deconvolution, alignment, normalization, statistical significance tests, and pattern recognition. The PDP provides proteomic researchers with a user-friendly web-based data analysis package that can handle multiple file formats and facilitates data analysis from multiple proteomics technology platforms. The sys- tem is flexible and extensible to enable further development. In this paper the PDP development is described and the sys- tem capabilities are illustrated through a case study of human plasma proteomics data analysis.

[1]  J. Yates,et al.  Direct analysis of protein complexes using mass spectrometry , 1999, Nature Biotechnology.

[2]  F. Regnier,et al.  Quantitative proteomics strategy involving the selection of peptides containing both cysteine and histidine from tryptic digests of cell lysates. , 2002, Journal of chromatography. A.

[3]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[4]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[5]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[6]  Roeland C H J van Ham,et al.  Post alignment clustering procedure for comparative quantitative proteomics LC‐MS Data , 2008, Proteomics.

[7]  Gilbert S Omenn,et al.  An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis , 2005, Proteomics.

[8]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[9]  Joshua N. Adkins,et al.  MASIC: A software program for fast quantitation and flexible visualization of chromatographic profiles from detected LC-MS(/MS) features , 2008, Comput. Biol. Chem..

[10]  Masao Nagasaki,et al.  AYUMS: an algorithm for completely automatic quantitation based on LC-MS/MS proteome data and its application to the analysis of signal transduction , 2007, BMC Bioinform..

[11]  J. Glimm,et al.  Detection of cancer-specific markers amid massive mass spectral data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[13]  T. Shaler,et al.  Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. , 2003, Analytical chemistry.

[14]  Ning Zhang,et al.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics , 2008, BMC Bioinformatics.

[15]  Tommi S. Jaakkola,et al.  Maximum-likelihood estimation of optimal scaling factors for expression array normalization , 2001, SPIE BiOS.

[16]  F. Regnier,et al.  An automated method for the analysis of stable isotope labeling data in proteomics , 2005, Journal of the American Society for Mass Spectrometry.

[17]  Theodore R Sana,et al.  A sample extraction and chromatographic strategy for increasing LC/MS detection coverage of the erythrocyte metabolome. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[18]  F. Regnier,et al.  A method for the identification of glycoproteins from human serum by a combination of lectin affinity chromatography along with anion exchange and Cu-IMAC selection of tryptic peptides. , 2007, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[19]  Jean Jacques Moreau,et al.  SOAP Version 1. 2 Part 1: Messaging Framework , 2003 .

[20]  Rong Wang,et al.  The APEX Quantitative Proteomics Tool: Generating protein quantitation estimates from LC-MS/MS proteomics results , 2008, BMC Bioinformatics.

[21]  R. Aebersold,et al.  Mass Spectrometry and Protein Analysis , 2006, Science.

[22]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[23]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[24]  Lukas N. Mueller,et al.  An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. , 2008, Journal of proteome research.

[25]  Gerhard Körting,et al.  Managing Proteomics Data: From Generation and Data Warehousing to Central Data Repository , 2008 .

[26]  Xiang Zhang,et al.  In-Gel Stable-Isotope Labeling (ISIL): a strategy for mass spectrometry-based relative quantification. , 2006, Journal of proteome research.

[27]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[28]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[29]  Adam Rauch,et al.  Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. , 2006, Journal of proteome research.

[30]  Handbook of Parametric and Nonparametric Statistical Procedures , 2004 .

[31]  A. Dominiczak,et al.  Body fluid proteomics for biomarker discovery: lessons from the past hold the key to success in the future. , 2007, Journal of proteome research.

[32]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[33]  Navdeep Jaitly,et al.  DAnTE: a statistical tool for quantitative analysis of -omics data , 2008, Bioinform..

[34]  Jean-Philippe Lambert,et al.  Proteomics: from gel based to gel free. , 2005, Analytical chemistry.

[35]  Maciek Sasinowski,et al.  What is mzXML good for? , 2005, Expert review of proteomics.

[36]  Karl Mechtler,et al.  MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data , 2007, BMC Bioinformatics.

[37]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .