The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results

We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.

[1]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[2]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[3]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry in proteomics , 2008, Nature Biotechnology.

[4]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[5]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[6]  Chris F. Taylor,et al.  Guidelines for reporting the use of gel electrophoresis in proteomics , 2008, Nature Biotechnology.

[7]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[8]  Credit where credit is overdue , 2009, Nature Biotechnology.

[9]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[10]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[11]  Richard D. Smith,et al.  Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles) , 2012, Proteomics.

[12]  Lennart Martens,et al.  Automated reprocessing pipeline for searching heterogeneous mass spectrometric data of the HUPO Brain Proteome Project pilot phase , 2006, Proteomics.

[13]  Chris F. Taylor,et al.  Autumn 2005 Workshop of the Human Proteome Organisation Proteomics Standards Initiative (HUPO‐PSI) Geneva, September, 4–6, 2005 , 2006, Proteomics.

[14]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[15]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[16]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[17]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[18]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry informatics in proteomics , 2008, Nature Biotechnology.

[19]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[20]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[21]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[22]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[23]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[24]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[25]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[26]  Lisa O'Neill,et al.  ICDD Annual Spring Meetings , 2013, Powder Diffraction.

[27]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[28]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[29]  D. Creasy,et al.  Unimod: Protein modifications for mass spectrometry , 2004, Proteomics.

[30]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[31]  O. Kohlbacher,et al.  Probabilistic consensus scoring improves tandem mass spectrometry peptide identification. , 2011, Journal of proteome research.

[32]  Lennart Martens,et al.  The PSI formal document process and its implementation on the PSI website , 2007, Proteomics.

[33]  Peter Woollard,et al.  The minimum information required for reporting a molecular interaction experiment (MIMIx) , 2007, Nature Biotechnology.

[34]  Henning Hermjakob,et al.  Annual Spring Meeting of the Proteomics Standards Initiative 23–25 April 2008, Toledo, Spain , 2008, Proteomics.

[35]  Lennart Martens,et al.  The PSI semantic validator: A framework to check MIAPE compliance of proteomics data , 2009, Proteomics.

[36]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[37]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[38]  Ruth McNally,et al.  Proteomics and Beyond A report on the 3rd Annual Spring Workshop of the HUPO‐PSI 21–23 April 2006, San Francisco, CA, USA , 2006, Proteomics.

[39]  Norman W. Paton,et al.  Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines , 2009, Proteomics.

[40]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[41]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[42]  Peter Woollard,et al.  A Community Standard Format for the Representation of Protein Affinity Reagents* , 2009, Molecular & Cellular Proteomics.

[43]  Christian Stephan,et al.  The HUPO Pre‐Congress Proteomics Standards Initiative Workshop HUPO 5th Annual World Congress Long Beach, CA, USA 28 October–1 November 2006 , 2007, Proteomics.

[44]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[45]  Nigel W. Hardy,et al.  The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics , 2007, Nature Biotechnology.

[46]  J. Yates,et al.  Probability-based validation of protein identifications using a modified SEQUEST algorithm. , 2002, Analytical chemistry.

[47]  S. Carr,et al.  Reporting Protein Identification Data , 2006, Molecular & Cellular Proteomics.

[48]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[49]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[50]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[51]  Michael J MacCoss,et al.  Using BiblioSpec for Creating and Searching Tandem MS Peptide Libraries , 2007, Current protocols in bioinformatics.

[52]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[53]  Lukas N. Mueller,et al.  An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. , 2008, Journal of proteome research.

[54]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[55]  Richard D Smith,et al.  Recommendations for Mass Spectrometry Data Quality Metrics for Open Access Data (Corollary to the Amsterdam Principles)* , 2011, Molecular & Cellular Proteomics.