A tutorial for software development in quantitative proteomics using PSI standard formats

The Human Proteome Organisation — Proteomics Standards Initiative (HUPO-PSI) has been working for ten years on the development of standardised formats that facilitate data sharing and public database deposition. In this article, we review three HUPO-PSI data standards — mzML, mzIdentML and mzQuantML, which can be used to design a complete quantitative analysis pipeline in mass spectrometry (MS)-based proteomics. In this tutorial, we briefly describe the content of each data model, sufficient for bioinformaticians to devise proteomics software. We also provide guidance on the use of recently released application programming interfaces (APIs) developed in Java for each of these standards, which makes it straightforward to read and write files of any size. We have produced a set of example Java classes and a basic graphical user interface to demonstrate how to use the most important parts of the PSI standards, available from http://code.google.com/p/psi-standard-formats-tutorial. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.

[1]  Johann Joets,et al.  Guidelines for reporting the use of gel image informatics in proteomics , 2010, Nature Biotechnology.

[2]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[3]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[4]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry in proteomics , 2008, Nature Biotechnology.

[5]  Martin Eisenacher,et al.  The mzQuantML Data Standard for Mass Spectrometry–based Quantitative Studies in Proteomics , 2013, Molecular & Cellular Proteomics.

[6]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[7]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[8]  Richard J Jacob,et al.  Bioinformatics for LC-MS/MS-based proteomics. , 2010, Methods in molecular biology.

[9]  Lennart Martens,et al.  jmzML, an open‐source Java API for mzML, the PSI standard for MS data , 2010, Proteomics.

[10]  Keiryn L. Bennett,et al.  Introduction to Computational Proteomics , 2007, PLoS Comput. Biol..

[11]  Chris F. Taylor,et al.  Guidelines for reporting the use of gel electrophoresis in proteomics , 2008, Nature Biotechnology.

[12]  Leo C. McHugh,et al.  Computational Methods for Protein Identification from Mass Spectrometry Data , 2008, PLoS Comput. Biol..

[13]  Sompop Bencharit,et al.  Where are we in the world of proteomics and bioinformatics? , 2012, Expert review of proteomics.

[14]  Hamid Mirzaei,et al.  Guidelines for reporting the use of column chromatography in proteomics , 2010, Nature Biotechnology.

[15]  Rune Matthiesen,et al.  Methods, algorithms and tools in computational proteomics: A practical point of view , 2007, Proteomics.

[16]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[17]  Lennart Martens,et al.  jTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM Transitions , 2011, Journal of proteome research.

[18]  Chris F. Taylor,et al.  Guidelines for reporting the use of capillary electrophoresis in proteomics , 2010, Nature Biotechnology.

[19]  Lennart Martens,et al.  The PSI formal document process and its implementation on the PSI website , 2007, Proteomics.

[20]  Knut Reinert,et al.  Bioinformatics for qualitative and quantitative proteomics. , 2011, Methods in molecular biology.

[21]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[22]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[23]  Ruedi Aebersold,et al.  Options and considerations when selecting a quantitative proteomics strategy , 2010, Nature Biotechnology.

[24]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[25]  Lennart Martens,et al.  Bioinformatics challenges in mass spectrometry-driven proteomics. , 2011, Methods in molecular biology.

[26]  Lennart Martens,et al.  The PSI semantic validator: A framework to check MIAPE compliance of proteomics data , 2009, Proteomics.

[27]  Bobbie-Jo M. Webb-Robertson,et al.  Current trends in computational inference from mass spectrometry-based proteomics , 2007, Briefings Bioinform..

[28]  Mario Cannataro Computational proteomics: management and analysis of proteomics data , 2008, Briefings Bioinform..

[29]  Rolf Apweiler,et al.  The Proteomics Standards Initiative , 2003, Proteomics.

[30]  Bin Ma,et al.  Software for computational peptide identification from MS-MS data. , 2006, Drug discovery today.

[31]  Michael Specht,et al.  pymzML - Python module for high-throughput bioinformatics on mass spectrometry data , 2012, Bioinform..

[32]  B. Searle Scaffold: A bioinformatic tool for validating MS/MS‐based proteomic studies , 2010, Proteomics.

[33]  Martin Eisenacher,et al.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary , 2013, Database J. Biol. Databases Curation.

[34]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[35]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[36]  Lennart Martens,et al.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* , 2011, Molecular & Cellular Proteomics.

[37]  Jun Fan,et al.  A critical appraisal of techniques, software packages, and standards for quantitative proteomic analysis. , 2012, Omics : a journal of integrative biology.

[38]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[39]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry informatics in proteomics , 2008, Nature Biotechnology.

[40]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[41]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[42]  Juan Antonio Vizcaíno,et al.  jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data , 2012, Proteomics.

[43]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[44]  Andrew R. Jones,et al.  An Introduction to Proteome Bioinformatics , 2010, Proteome Bioinformatics.

[45]  Olga Vitek,et al.  Computational Mass Spectrometry–Based Proteomics , 2011, PLoS Comput. Biol..

[46]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[47]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.