SWATH2stats: An R/Bioconductor Package to Process and Convert Quantitative SWATH-MS Proteomics Data for Downstream Analysis Tools

SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.

[1]  Hyungwon Choi,et al.  mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry. , 2015, Journal of proteomics.

[2]  Lukas Käll,et al.  Solution to Statistical Challenges in Proteomics Is More Statistics, Not Less. , 2015, Journal of proteome research.

[3]  Lars Malmström,et al.  DIANA - algorithmic improvements for analysis of data-independent acquisition MS data , 2015, Bioinform..

[4]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[5]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[6]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[7]  Lars Malmström,et al.  aLFQ: an R-package for estimating absolute protein quantities from label-free LC-MS/MS proteomics data , 2014, Bioinform..

[8]  R. Aebersold,et al.  Selected reaction monitoring–based proteomics: workflows, potential, pitfalls and future directions , 2012, Nature Methods.

[9]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[10]  R. Aebersold,et al.  mProphet: automated data processing and statistical validation for large-scale SRM experiments , 2011, Nature Methods.

[11]  Alexey I Nesvizhskii,et al.  Abacus: A computational tool for extracting and pre‐processing spectral count data for label‐free quantitative proteomic analysis , 2011, Proteomics.

[12]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[13]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[14]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[15]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[16]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[17]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[18]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[19]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[20]  Ruedi Aebersold,et al.  Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. , 2010, Journal of proteome research.

[21]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[22]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[23]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[24]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .