A High-Throughput Bioinformatics Platform for Mass Spectrometry-Based Proteomics

The success of mass spectrometry-based proteomics in emerging applications such as biomarker discovery and clinical diagnostics, is predicated substantially on its ability to achieve growing demands for throughput. Support for high throughput implies sophisticated tracking of experiments and the experimental steps, larger amounts of data to be organized and summarized, more complex algorithms for inferring and tracking protein expression across multiple experiments, statistical methods to access data quality, and a streamlined proteomics-centric bioinformatics environment to establish the biological context and relevance of the experimental measurements. This paper presents a bioinformatics platform that was built for an industrial mass spectrometry-based proteomics laboratory focusing on biomarker discovery. The basis of the platform is a robust and scalable information management environment supported by database and workflow management technology that is employed for the integration of heterogeneous data, applications and processes across the entire laboratory workflow. This paper focuses on selected features of the platform which include: (a) a method for improving the accuracy of protein assignment, (b) novel software tools for protein expression analysis that combine differential MS quantitation with tandem MS for peptide identification, and (c) integration of methods to aid the biological relevance and statistical significance of differentially expressed proteins.

[1]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[2]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[3]  C. G. Edmonds,et al.  New developments in biochemical mass spectrometry: electrospray ionization. , 1990, Analytical chemistry.

[4]  I. Chernushevich,et al.  An introduction to quadrupole-time-of-flight mass spectrometry. , 2001, Journal of mass spectrometry : JMS.

[5]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[6]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[7]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[8]  Lewis Y. Geer,et al.  DBParser: web-based software for shotgun proteomic data analyses. , 2004, Journal of proteome research.

[9]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[10]  Ulrich R. Bernier,et al.  Gas chromatography/mass spectrometry analysis of the cuticular hydrocarbons from parasitic wasps of the genus Muscidifurax , 1998, Journal of the American Society for Mass Spectrometry.

[11]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[12]  R. Aebersold,et al.  Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. , 2003, Analytical chemistry.

[13]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[14]  F. McLafferty,et al.  Automated assignment of charge states from resolved isotopic peaks for multiply charged ions , 1995, Journal of the American Society for Mass Spectrometry.

[15]  J. Listgarten,et al.  Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-Tandem Mass Spectrometry , 2005, Molecular & Cellular Proteomics.

[16]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[17]  Scott A. Busby,et al.  Novel linear quadrupole ion trap/FT mass spectrometer: performance characterization and use in the comparative analysis of histone H3 post-translational modifications. , 2004, Journal of proteome research.

[18]  M. Baldwin Protein Identification by Mass Spectrometry , 2004, Molecular & Cellular Proteomics.

[19]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[20]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[21]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[22]  Zhongqi Zhang,et al.  A universal algorithm for fast and automated charge state deconvolution of electrospray mass-to-charge ratio spectra , 1998, Journal of the American Society for Mass Spectrometry.

[23]  B. Cargile,et al.  Potential for false positive identifications from large databases through tandem mass spectrometry. , 2004, Journal of proteome research.

[24]  David C Muddiman,et al.  Analysis of the low molecular weight fraction of serum by LC-dual ESI-FT-ICR mass spectrometry: precision of retention time, mass, and ion abundance. , 2004, Analytical chemistry.

[25]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[26]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[27]  M Richard Simon,et al.  Design and Analysis of DNA Microarray Investigations , 2004 .

[28]  Peder Thusgaard Ruhoff,et al.  Experimental Peptide Identification Repository (EPIR) , 2004, Molecular & Cellular Proteomics.

[29]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[30]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[31]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[32]  F. McLafferty,et al.  Automated reduction and interpretation of , 2000, Journal of the American Society for Mass Spectrometry.

[33]  J. Yates,et al.  A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. , 2003, Analytical chemistry.