Development of data representation standards by the human proteome organization proteomics standards initiative

Objective To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI’s evolution, and future directions and synergies for the group. Materials and Methods The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. Results We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. The PSI has produced a series of standard formats covering mass spectrometer input, mass spectrometer output, results of informatics analysis (both qualitative and quantitative analyses), reports of molecular interaction data, and gel electrophoresis analyses. We have produced controlled vocabularies that ensure that concepts are uniformly annotated in the formats and engaged in extensive software development and dissemination efforts so that the standards can efficiently be used by the community. Conclusion In its first dozen years of operation, the PSI has produced many standards that have accelerated the field of proteomics by facilitating data exchange and deposition to data repositories. We look to the future to continue developing standards for new proteomics technologies and workflows and mechanisms for integration with other omics data types. Our products facilitate the translation of genomics and proteomics findings to clinical and biological phenotypes. The PSI website can be accessed at http://www.psidev.info.

[1]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[2]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[3]  J STORY Five years of progress. , 1959, The Canadian nurse.

[4]  Juan Antonio Vizcaíno,et al.  Tools (Viewer, Library and Validator) that Facilitate Use of the Peptide and Protein Identification Standard Format, Termed mzIdentML , 2013, Molecular & Cellular Proteomics.

[5]  David L Tabb,et al.  Employing ProteoWizard to Convert Raw Mass Spectrometry Data , 2014, Current protocols in bioinformatics.

[6]  Martin Eisenacher,et al.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary , 2013, Database J. Biol. Databases Curation.

[7]  Liang Li,et al.  Definitions of terms relating to mass spectrometry (IUPAC Recommendations 2013) , 2013 .

[8]  Lennart Martens,et al.  qcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments* , 2014, Molecular & Cellular Proteomics.

[9]  Lennart Martens,et al.  The PSI formal document process and its implementation on the PSI website , 2007, Proteomics.

[10]  Peter Woollard,et al.  The minimum information required for reporting a molecular interaction experiment (MIMIx) , 2007, Nature Biotechnology.

[11]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[12]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[13]  Johann Joets,et al.  Guidelines for reporting the use of gel image informatics in proteomics , 2010, Nature Biotechnology.

[14]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[15]  Henning Hermjakob,et al.  Tackling Quantitation: A Report on the Annual Spring Workshop of the HUPO‐PSI 28–30 March 2010, Seoul, South Korea , 2010, Proteomics.

[16]  Martin Eisenacher,et al.  From Proteomics Data Representation to Public Data Flow: A Report on the HUPO‐PSI Workshop September 2011, Geneva, Switzerland , 2012, Proteomics.

[17]  Da Qi,et al.  The jmzQuantML programming interface and validator for the mzQuantML data standard , 2014, Proteomics.

[18]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[19]  Chris F. Taylor,et al.  Autumn 2005 Workshop of the Human Proteome Organisation Proteomics Standards Initiative (HUPO‐PSI) Geneva, September, 4–6, 2005 , 2006, Proteomics.

[20]  Hamid Mirzaei,et al.  Guidelines for reporting the use of column chromatography in proteomics , 2010, Nature Biotechnology.

[21]  Martin Eisenacher,et al.  The mzQuantML Data Standard for Mass Spectrometry–based Quantitative Studies in Proteomics , 2013, Molecular & Cellular Proteomics.

[22]  Democratizing proteomics data , 2007, Nature Biotechnology.

[23]  Juan Antonio Vizcaíno,et al.  A toolkit for the mzIdentML standard: the ProteoIDViewer, the mzidLibrary and the mzidValidator , 2013 .

[24]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[25]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry informatics in proteomics , 2008, Nature Biotechnology.

[26]  Christoph Steinbeck,et al.  MetaboLights: towards a new COSMOS of metabolomics data management , 2012, Metabolomics.

[27]  Henning Hermjakob,et al.  Minimum information about a protein affinity reagent (MIAPAR) , 2010, Nature Biotechnology.

[28]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[29]  Johannes Goll,et al.  A new reference implementation of the PSICQUIC web service , 2013, Nucleic Acids Res..

[30]  John Degaspari Managing the data explosion. , 2013, Healthcare informatics : the business magazine for information and communication systems.

[31]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[32]  Lennart Martens,et al.  jmzML, an open‐source Java API for mzML, the PSI standard for MS data , 2010, Proteomics.

[33]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[34]  S. Carr,et al.  Reporting Protein Identification Data , 2006, Molecular & Cellular Proteomics.

[35]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[36]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[37]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[38]  Henning Hermjakob,et al.  Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO‐Proteomics Standards Initiative April 23–25, 2007 Ecole Nationale Supérieure (ENS), Lyon, France , 2007, Proteomics.

[39]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[40]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry in proteomics , 2008, Nature Biotechnology.

[41]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[42]  Martin Eisenacher,et al.  A standardized framing for reporting protein identifications in mzIdentML 1.2 , 2014, Proteomics.

[43]  Juan P Albar,et al.  The Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative. , 2014, Methods in molecular biology.

[44]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[45]  Thou shalt share your data , 2008, Nature Methods.

[46]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[47]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[48]  Chris F. Taylor,et al.  Minimum Reporting Requirements for Proteomics: A MIAPE Primer , 2006, Proteomics.

[49]  Chris F. Taylor,et al.  Guidelines for reporting the use of capillary electrophoresis in proteomics , 2010, Nature Biotechnology.

[50]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[51]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[52]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[53]  Richard P. Hooper,et al.  Managing the Data Explosion , 1993 .

[54]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[55]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[56]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[57]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[58]  Johannes Griss,et al.  jmzTab: A Java interface to the mzTab data standard , 2014, Proteomics.

[59]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[60]  Martin Eisenacher,et al.  Managing the Data Explosion A Report on the HUPO‐PSI Workshop August 2008, Amsterdam, The Netherlands , 2009, Proteomics.

[61]  Lennart Martens,et al.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* , 2011, Molecular & Cellular Proteomics.

[62]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[63]  Jun Fan,et al.  A critical appraisal of techniques, software packages, and standards for quantitative proteomic analysis. , 2012, Omics : a journal of integrative biology.

[64]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[65]  Rolf Apweiler,et al.  Further advances in the development of a data interchange standard for proteomics data , 2003, Proteomics.

[66]  John Quackenbush,et al.  Data standards: a call to action. , 2006, Omics : a journal of integrative biology.

[67]  Henning Hermjakob,et al.  Preparing to Work with Big Data in Proteomics – A Report on the HUPO‐PSI Spring Workshop , 2013, Proteomics.

[68]  Lennart Martens,et al.  jTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM Transitions , 2011, Journal of proteome research.

[69]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[70]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[71]  Henning Hermjakob,et al.  The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative , 2010, Proteomics.

[72]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[73]  Peter Murray-Rust,et al.  Minimum information about a bioactive entity (MIABE) , 2011, Nature Reviews Drug Discovery.

[74]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[75]  Juan Antonio Vizcaíno,et al.  jmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data , 2012, Proteomics.

[76]  Martin Eisenacher,et al.  Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. , 2013, Journal of proteomics.

[77]  Martin Eisenacher,et al.  Controlled vocabularies and ontologies in proteomics: Overview, principles and practice , 2014, Biochimica et biophysica acta.

[78]  Martin Eisenacher,et al.  Implementing Data Standards: A report on the HUPOPSI Workshop September 2009, Toronto, Canada , 2010, Proteomics.

[79]  Amos Bairoch,et al.  Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. , 2014, Journal of proteome research.

[80]  Lennart Martens,et al.  The Ontology Lookup Service: bigger and better , 2010, Nucleic Acids Res..

[81]  Yasset Perez-Riverol,et al.  Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective , 2014, Biochimica et biophysica acta.

[82]  J Alberto Medina-Aunon,et al.  The ProteoRed MIAPE web toolkit: A User-friendly Framework to Connect and Share Proteomics Standards* , 2011, Molecular & Cellular Proteomics.

[83]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.