MaSS‐Simulator: A Highly Configurable Simulator for Generating MS/MS Datasets for Benchmarking of Proteomics Algorithms

Mass Spectrometry (MS)‐based proteomics has become an essential tool in the study of proteins. With the advent of modern MS machines huge amounts of data is being generated, which can only be processed by novel algorithmic tools. However, in the absence of data benchmarks and ground truth datasets algorithmic integrity testing and reproducibility is a challenging problem. To this end, MaSS‐Simulator has been presented, which is an easy to use simulator and can be configured to simulate MS/MS datasets for a wide variety of conditions with known ground truths. MaSS‐Simulator offers many configuration options to allow the user a great degree of control over the test datasets, which can enable rigorous and large‐ scale testing of any proteomics algorithm. MaSS‐Simulator is assessed by comparing its performance against experimentally generated spectra and spectra obtained from NIST collections of spectral library. The results show that MaSS‐Simulator generated spectra match closely with real‐spectra and have a relative‐error distribution centered around 25%. In contrast, the theoretical spectra for same peptides have relative‐error distribution centered around 150%. MaSS‐Simulator will enable developers to specifically highlight the capabilities of their algorithms and provide a strong proof of any pitfalls they might face. Source code, executables, and a user manual for MaSS‐Simulator can be downloaded from https://github.com/pcdslab/MaSS-Simulator.

[1]  Brett Tully,et al.  Toffee – a highly efficient, lossless file format for DIA-MS , 2019, Scientific Reports.

[2]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[3]  Edward L. Huttlin,et al.  Evaluation of HCD- and CID-type Fragmentation Within Their Respective Detection Platforms For Murine Phosphoproteomics* , 2011, Molecular & Cellular Proteomics.

[4]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[5]  Ravali Adusumilli,et al.  Data Conversion with ProteoWizard msConvert. , 2017, Methods in molecular biology.

[6]  M. Mann,et al.  Mass Spectrometry-based Proteomics Using Q Exactive, a High-performance Benchtop Quadrupole Orbitrap Mass Spectrometer* , 2011, Molecular & Cellular Proteomics.

[7]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[8]  Fahad Saeed,et al.  CPhos: A program to calculate and visualize evolutionarily conserved functional phosphorylation sites , 2012, Proteomics.

[9]  Fahad Saeed,et al.  MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing , 2016, Bioinform..

[10]  Bo Yan,et al.  A graph-theoretic approach for the separation of b and y ions in tandem mass spectra , 2005, Bioinform..

[11]  J. Yates,et al.  Mass spectrometry for proteomics. , 2008, Current opinion in chemical biology.

[12]  Dongbo Bu,et al.  MS-Simulator: predicting y-ion intensities for peptides with two charges based on the intensity ratio of neighboring ions. , 2012, Journal of proteome research.

[13]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[14]  K. Medzihradszky,et al.  Lessons in de novo peptide sequencing by tandem mass spectrometry. , 2015, Mass spectrometry reviews.

[15]  Jing Chen,et al.  Glycoproteomic Analysis of Prostate Cancer Tissues by SWATH Mass Spectrometry Discovers N-acylethanolamine Acid Amidase and Protein Tyrosine Kinase 7 as Signatures for Tumor Aggressiveness , 2014, Molecular & Cellular Proteomics.

[16]  Pavel A. Pevzner,et al.  UniNovo: a universal tool for de novo peptide sequencing , 2013, RECOMB.

[17]  Fahad Saeed,et al.  An Out-of-Core GPU based Dimensionality Reduction Algorithm for Big Mass Spectrometry Data and Its Application in Bottom-up Proteomics , 2017, BCB.

[18]  Knut Reinert,et al.  MSSimulator: Simulation of mass spectrometry data. , 2011, Journal of proteome research.

[19]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[20]  D. Scott,et al.  Small molecules, big targets: drug discovery faces the protein–protein interaction challenge , 2016, Nature Reviews Drug Discovery.

[21]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[22]  Vinny Davies,et al.  In Silico Optimization of Mass Spectrometry Fragmentation Strategies in Metabolomics , 2019, Metabolites.