MSSimulator: Simulation of mass spectrometry data.

Mass spectrometry coupled to liquid chromatography (LC-MS and LC-MS/MS) is commonly used to analyze the protein content of biological samples in large scale studies, enabling quantitation and identification of proteins and peptides using a wide range of experimental protocols, algorithms, and statistical models to analyze the data. Currently it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists for peptide identification algorithms but data that represents a ground truth for the evaluation of LC-MS data is limited. Hence there have been attempts to simulate such data in a controlled fashion to evaluate and compare algorithms. We present MSSimulator, a simulation software for LC-MS and LC-MS/MS experiments. Starting from a list of proteins from a FASTA file, the simulation will perform in-silico digestion, retention time prediction, ionization filtering, and raw signal simulation (including MS/MS), while providing many options to change the properties of the resulting data like elution profile shape, resolution and sampling rate. Several protocols for SILAC, iTRAQ or MS(E) are available, in addition to the usual label-free approach, making MSSimulator the most comprehensive simulator for LC-MS and LC-MS/MS data.

[1]  Jennifer A. Siepen,et al.  Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. , 2007, Journal of proteome research.

[2]  Ari Frank,et al.  Predicting intensity ranks of peptide fragment ions. , 2009, Journal of proteome research.

[3]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[4]  H. Kubinyi CALCULATION OF ISOTOPE DISTRIBUTIONS IN MASS SPECTROMETRY. A TRIVIAL SOLUTION FOR A NON-TRIVIAL PROBLEM , 1991 .

[5]  Steven P Gygi,et al.  Intensity-based protein identification by machine learning from a library of tandem mass spectra , 2004, Nature Biotechnology.

[6]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[7]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[8]  Daniel López-Ferrer,et al.  Improved Method for Differential Expression Proteomics Using Trypsin-catalyzed 18O Labeling with a Correction for Labeling Efficiency *S , 2007, Molecular & Cellular Proteomics.

[9]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[10]  Zhongqi Zhang Prediction of low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[11]  Jason W. H. Wong,et al.  ETISEQ – an algorithm for automated elution time ion sequencing of concurrently fragmented peptides for mass spectrometry-based proteomics , 2009, BMC Bioinformatics.

[12]  A. Makarov,et al.  Performance evaluation of a hybrid linear ion trap/orbitrap mass spectrometer. , 2006, Analytical chemistry.

[13]  D. Creasy,et al.  Unimod: Protein modifications for mass spectrometry , 2004, Proteomics.

[14]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[15]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[16]  Predrag Radivojac,et al.  A Machine Learning Approach to Predicting Peptide Fragmentation Spectra , 2005, Pacific Symposium on Biocomputing.

[17]  Jianfeng Feng,et al.  A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data , 2008, BMC Bioinformatics.

[18]  P. Roepstorff,et al.  Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using (18)O-labeled internal standards. , 2000, Rapid communications in mass spectrometry : RCM.

[19]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[20]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[21]  J. A. Nolan,et al.  Pharmaceutical Drug Separations by HPCE: Practical Guidelines , 1992 .

[22]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[23]  Jeffrey S. Morris,et al.  Understanding the characteristics of mass spectrometry data through the use of simulation , 2005, Cancer informatics.

[24]  M. MacCoss,et al.  High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. , 2007, Analytical chemistry.

[25]  Knut Reinert,et al.  LC-MSsim – a simulation software for liquid chromatography mass spectrometry data , 2008, BMC Bioinformatics.

[26]  R. Aebersold,et al.  Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry , 2001, Nature Biotechnology.

[27]  Bernhard Y. Renard,et al.  NITPICK: peak identification for mass spectrometry data , 2008, BMC Bioinformatics.

[28]  J. Jorgenson,et al.  A hybrid of exponential and gaussian functions as a simple model of asymmetric chromatographic peaks. , 2001, Journal of chromatography. A.

[29]  Chao Yang,et al.  Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis , 2009, BMC Bioinformatics.

[30]  R. Aebersold,et al.  Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. , 2003, Analytical chemistry.

[31]  Oliver Kohlbacher,et al.  Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics , 2007, BMC Bioinformatics.