mzML—a Community Standard for Mass Spectrometry Data*

Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.

[1]  Jayson A. Falkner,et al.  Tranche: decentralized data storage for the proteomics community , 2007 .

[2]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[3]  John D. Venable,et al.  MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. , 2004, Rapid communications in mass spectrometry : RCM.

[4]  Herbert Thiele,et al.  P7-S Combining Workflow-Based Project Organization with Protein-Dependant Data Retrieval for the Retrieval of Extensive Proteome Information , 2007 .

[5]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[6]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[7]  Lennart Martens,et al.  The PSI formal document process and its implementation on the PSI website , 2007, Proteomics.

[8]  Rong Wang,et al.  The need for a public proteomics repository , 2004, Nature Biotechnology.

[9]  Eric W. Deutsch,et al.  Mass Spectrometer Output File Format mzML , 2010, Proteome Bioinformatics.

[10]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[11]  J STORY Five years of progress. , 1959, The Canadian nurse.

[12]  Democratizing proteomics data , 2007, Nature Biotechnology.

[13]  Nigel W. Hardy,et al.  The first RSBI (ISA-TAB) workshop: "can a simple format work for complex studies?". , 2008, Omics : a journal of integrative biology.

[14]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[15]  Patrick G. A. Pedrioli Trans-Proteomic Pipeline: A Pipeline for Proteomic Analysis , 2010, Proteome Bioinformatics.

[16]  Lennart Martens,et al.  Charting online OMICS resources: A navigational chart for clinical researchers , 2009, Proteomics. Clinical applications.

[17]  Lennart Martens,et al.  jmzML, an open‐source Java API for mzML, the PSI standard for MS data , 2010, Proteomics.

[18]  Kei-Hoi Cheung,et al.  X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers. , 2008, Journal of proteome research.

[19]  Robert A. Grothe,et al.  Precursor-ion mass re-estimation improves peptide identification on hybrid instruments. , 2008, Journal of proteome research.

[20]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[21]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[22]  Henning Hermjakob,et al.  Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO‐Proteomics Standards Initiative April 23–25, 2007 Ecole Nationale Supérieure (ENS), Lyon, France , 2007, Proteomics.

[23]  Chris F. Taylor,et al.  Guidelines for reporting the use of mass spectrometry in proteomics , 2008, Nature Biotechnology.

[24]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[25]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[26]  Mind the technology gap , 2012, Nature Methods.

[27]  Rolf Apweiler,et al.  Further advances in the development of a data interchange standard for proteomics data , 2003, Proteomics.

[28]  Lennart Martens,et al.  The PSI semantic validator: A framework to check MIAPE compliance of proteomics data , 2009, Proteomics.

[29]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[30]  Fredrik Levander,et al.  The proteios software environment: an extensible multiuser platform for management and analysis of proteomics data. , 2009, Journal of proteome research.

[31]  Lennart Martens,et al.  Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories , 2005, Proteomics.

[32]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[33]  Thou shalt share your data , 2008, Nature Methods.

[34]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[35]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[36]  Jennifer A Mead,et al.  Recent developments in public proteomic MS repositories and pipelines , 2009, Proteomics.

[37]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.