PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets

The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE. The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX “complete” submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/.

[1]  M. Tress,et al.  Analyzing the First Drafts of the Human Proteome , 2014, Journal of proteome research.

[2]  Juan Antonio Vizcaíno,et al.  ms-data-core-api: an open-source, metadata-oriented library for computational proteomics , 2015, Bioinform..

[3]  Christoph Steinbeck,et al.  Dissemination of metabolomics results: role of MetaboLights and COSMOS , 2013, GigaScience.

[4]  Eric W. Deutsch,et al.  File Formats Commonly Used in Mass Spectrometry Proteomics* , 2012, Molecular & Cellular Proteomics.

[5]  Erdmann Rapp,et al.  The Minimum Information Required for a Glycomics Experiment (MIRAGE) Project: Improving the Standards for Reporting Mass-spectrometry-based Glycoanalytic Data , 2013, Molecular & Cellular Proteomics.

[6]  D. Creasy,et al.  Unimod: Protein modifications for mass spectrometry , 2004, Proteomics.

[7]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[8]  Stephan M. Winkler,et al.  MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra , 2014, Journal of proteome research.

[9]  Martin Eisenacher,et al.  Controlled vocabularies and ontologies in proteomics: Overview, principles and practice , 2014, Biochimica et biophysica acta.

[10]  Juan Antonio Vizcaíno,et al.  A toolkit for the mzIdentML standard: the ProteoIDViewer, the mzidLibrary and the mzidValidator , 2013 .

[11]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[12]  Knut Reinert,et al.  OpenMS and TOPP: open source software for LC-MS data analysis. , 2011, Methods in molecular biology.

[13]  Markus Müller,et al.  Isoelectric point optimization using peptide descriptors and support vector machines. , 2012, Journal of proteomics.

[14]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[15]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[16]  Richard D. Smith,et al.  Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles) , 2012, Proteomics.

[17]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[18]  John D. Venable,et al.  MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. , 2004, Rapid communications in mass spectrometry : RCM.

[19]  Peter R Baker,et al.  MS-Viewer: A Web-based Spectral Viewer for Proteomics Results* , 2014, Molecular & Cellular Proteomics.

[20]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[21]  Stefan Tenzer,et al.  Drift time-specific collision energies enable deep-coverage data-independent acquisition proteomics , 2013, Nature Methods.

[22]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[23]  Eric W Deutsch,et al.  State of the human proteome in 2013 as viewed through PeptideAtlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven Human Proteome Project. , 2014, Journal of proteome research.

[24]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.

[25]  Markus Müller,et al.  In silico analysis of accurate proteomics, complemented by selective isolation of peptides. , 2011, Journal of proteomics.

[26]  B. Searle Scaffold: A bioinformatic tool for validating MS/MS‐based proteomic studies , 2010, Proteomics.

[27]  Richard D Smith,et al.  Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). , 2012, Journal of proteome research.

[28]  Yasset Pérez-Riverol,et al.  A UML-based Approach to Design Parallel and Distributed Applications , 2013, ArXiv.

[29]  Jürgen Cox,et al.  A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics , 2009, Nature Protocols.

[30]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[31]  Knut Reinert,et al.  OpenMS and TOPP: Open Source Software for LC-MS Data Analysis , 2010, Proteome Bioinformatics.

[32]  Samuel L Volchenboum,et al.  Rapid Validation of Mascot Search Results via Stable Isotope Labeling, Pair Picking, and Deconvolution of Fragmentation Patterns* , 2009, Molecular & Cellular Proteomics.

[33]  Juan Antonio Vizcaíno,et al.  Introducing the PRIDE Archive RESTful web services , 2015, Nucleic Acids Res..

[34]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[35]  Juan Antonio Vizcaíno,et al.  HI-bone: a scoring system for identifying phenylisothiocyanate-derivatized peptides based on precursor mass and high intensity fragment ions. , 2013, Analytical Chemistry.

[36]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[37]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[38]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[39]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[40]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[41]  Lennart Martens,et al.  Computational proteomics pitfalls and challenges: HavanaBioinfo 2012 workshop report. , 2013, Journal of proteomics.

[42]  Lennart Martens,et al.  The Ontology Lookup Service: bigger and better , 2010, Nucleic Acids Res..

[43]  Martin Eisenacher,et al.  PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface. , 2015, Journal of proteome research.

[44]  Yasset Perez-Riverol,et al.  Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective , 2014, Biochimica et biophysica acta.

[45]  A. Nesvizhskii,et al.  Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. , 2015, Journal of proteome research.

[46]  Johannes Griss,et al.  jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats , 2012, Proteomics.

[47]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[48]  Valmir C. Barbosa,et al.  On best practices in the development of bioinformatics software , 2014, Front. Genet..

[49]  Lennart Martens,et al.  PRIDE Inspector: a tool to visualize and validate MS proteomics data , 2011, Nature Biotechnology.

[50]  Yasset Perez-Riverol,et al.  Open source libraries and frameworks for biological data visualisation: A guide for developers , 2015, Proteomics.

[51]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[52]  Juan Antonio Vizcaíno,et al.  Tools (Viewer, Library and Validator) that Facilitate Use of the Peptide and Protein Identification Standard Format, Termed mzIdentML , 2013, Molecular & Cellular Proteomics.

[53]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[54]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[55]  Yasset Perez-Riverol,et al.  Making proteomics data accessible and reusable: Current state of proteomics databases and repositories , 2015, Proteomics.

[56]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.