File Formats Commonly Used in Mass Spectrometry Proteomics*

The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics.

[1]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[2]  Lennart Martens,et al.  jTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM Transitions , 2011, Journal of proteome research.

[3]  Ruedi Aebersold,et al.  Comprehensive proteomics. , 2011, Current opinion in biotechnology.

[4]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[5]  Jingchun Chen,et al.  ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry , 2011, BMC Bioinformatics.

[6]  William Stafford Noble,et al.  Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. , 2006, Analytical chemistry.

[7]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[8]  Lennart Martens,et al.  The PSI formal document process and its implementation on the PSI website , 2007, Proteomics.

[9]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[10]  J Alberto Medina-Aunon,et al.  The ProteoRed MIAPE web toolkit: A User-friendly Framework to Connect and Share Proteomics Standards* , 2011, Molecular & Cellular Proteomics.

[11]  Susan E Abbatiello,et al.  Automated detection of inaccurate and imprecise transitions in peptide quantification by multiple reaction monitoring mass spectrometry. , 2010, Clinical chemistry.

[12]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[13]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[14]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[15]  Fredrik Levander,et al.  Automated Selected Reaction Monitoring Software for Accurate Label-Free Protein Quantification , 2012, Journal of proteome research.

[16]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[17]  Yi Zhang,et al.  multiplierz: an extensible API based desktop environment for proteomics data analysis , 2009, BMC Bioinformatics.

[18]  Richard P. Hooper,et al.  Managing the Data Explosion , 1993 .

[19]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[20]  George S. Michaels,et al.  Should software hold data hostage? , 2004, Nature Biotechnology.

[21]  Martin Eisenacher,et al.  Managing the Data Explosion A Report on the HUPO‐PSI Workshop August 2008, Amsterdam, The Netherlands , 2009, Proteomics.

[22]  Lennart Martens,et al.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* , 2011, Molecular & Cellular Proteomics.

[23]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[24]  Jacob D. Jaffe,et al.  PEPPeR, a Platform for Experimental Proteomic Pattern Recognition*S , 2006, Molecular & Cellular Proteomics.

[25]  Lennart Martens,et al.  compomics-utilities: an open-source Java library for computational proteomics , 2011, BMC Bioinformatics.

[26]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[27]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[28]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[29]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[30]  Johannes Griss,et al.  jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats , 2012, Proteomics.

[31]  J STORY Five years of progress. , 1959, The Canadian nurse.

[32]  Lennart Martens,et al.  Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories , 2005, Proteomics.

[33]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[34]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[35]  J. Yates,et al.  Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level. , 1997, Analytical chemistry.

[36]  R. Aebersold,et al.  mProphet: automated data processing and statistical validation for large-scale SRM experiments , 2011, Nature Methods.

[37]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[38]  Pei Wang,et al.  Bioinformatics Original Paper a Suite of Algorithms for the Comprehensive Analysis of Complex Protein Mixtures Using High-resolution Lc-ms , 2022 .

[39]  John Degaspari Managing the data explosion. , 2013, Healthcare informatics : the business magazine for information and communication systems.

[40]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[41]  Jignesh R. Parikh,et al.  mzAPI: a new strategy for efficiently sharing mass spectrometry data , 2009, Nature Methods.

[42]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[43]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[44]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[45]  Henning Hermjakob,et al.  The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative , 2010, Proteomics.

[46]  Henning Hermjakob,et al.  Tackling Quantitation: A Report on the Annual Spring Workshop of the HUPO‐PSI 28–30 March 2010, Seoul, South Korea , 2010, Proteomics.

[47]  John D. Venable,et al.  MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. , 2004, Rapid communications in mass spectrometry : RCM.

[48]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[49]  Mathias Wilhelm,et al.  mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets* , 2011, Molecular & Cellular Proteomics.

[50]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[51]  Lennart Martens,et al.  PRIDE Converter: making proteomics data-sharing easy , 2009, Nature Biotechnology.

[52]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[53]  Henning Hermjakob,et al.  Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO‐Proteomics Standards Initiative April 23–25, 2007 Ecole Nationale Supérieure (ENS), Lyon, France , 2007, Proteomics.

[54]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[55]  Ning Zhang,et al.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics , 2008, BMC Bioinformatics.

[56]  Ruedi Aebersold,et al.  A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry*S , 2005, Molecular & Cellular Proteomics.

[57]  Fredrik Levander,et al.  The proteios software environment: an extensible multiuser platform for management and analysis of proteomics data. , 2009, Journal of proteome research.

[58]  Thou shalt share your data , 2008, Nature Methods.

[59]  Jari Häkkinen,et al.  PROTEIOS: an open source proteomics initiative , 2005, Bioinform..

[60]  Nigel W. Hardy,et al.  The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics , 2007, Nature Biotechnology.