DIAproteomics: A multi-functional data analysis pipeline for data-independent-acquisition proteomics and peptidomics

Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. Main advantages include greater reproducibility, sensitivity and dynamic range compared to data-dependent acquisition (DDA). However, data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics a multi-functional, automated high-throughput pipeline implemented in Nextflow that allows to easily process proteomics and peptidomics DIA datasets on diverse compute infrastructures. Central components are well-established tools such as the OpenSwathWorkflow for DIA spectral library search and PyProphet for false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and carry out retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from statistical post-processing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is open-source software and available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/.

[1]  William Stafford Noble Mass spectrometrists should search only for peptides they care about , 2015, Nature Methods.

[2]  Guo Ci Teo,et al.  Fast quantitative analysis of timsTOF PASEF data with MSFragger and IonQuant , 2020, bioRxiv.

[3]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[4]  Lennart Martens,et al.  Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques , 2019, Nucleic Acids Res..

[5]  Roland Bruderer,et al.  A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes , 2020, Nature Communications.

[6]  William Stafford Noble,et al.  Technical advances in proteomics: new developments in data-independent acquisition , 2016, F1000Research.

[7]  Eva Budinska,et al.  Breast cancer classification based on proteotypes obtained by SWATH mass spectrometry , 2019 .

[8]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[9]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[10]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[11]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[12]  Lennart Martens,et al.  Front Cover: Removing the Hidden Data Dependency of DIA with Predicted Spectral Libraries , 2020 .

[13]  Michael J MacCoss,et al.  Comparison of Data Acquisition Strategies on Quadrupole Ion Trap Instrumentation for Shotgun Proteomics , 2014, Journal of The American Society for Mass Spectrometry.

[14]  Lindsay K. Pino,et al.  The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics. , 2020, Mass spectrometry reviews.

[15]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[16]  Oliver Kohlbacher,et al.  OpenMS for open source analysis of mass spectrometric data , 2019 .

[17]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[18]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[19]  Yasset Perez-Riverol,et al.  A multi-center study benchmarks software tools for label-free proteome quantification , 2016, Nature Biotechnology.

[20]  Jürgen Cox,et al.  High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis , 2019, Nature Methods.

[21]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[22]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[23]  Lars Malmström,et al.  TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics , 2016, Nature Methods.

[24]  Sven Nahnsen,et al.  The nf-core framework for community-curated bioinformatics pipelines , 2020, Nature Biotechnology.

[25]  Oliver M. Bernhardt,et al.  Rapid and site-specific deep phosphoproteome profiling by data-independent acquisition without the need for spectral libraries , 2019, Nature Communications.

[26]  K. Reinert,et al.  OpenMS: a flexible open-source software platform for mass spectrometry data analysis , 2016, Nature Methods.

[27]  Lennart Martens,et al.  Updated MS2PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques , 2019 .

[28]  Nicole Rusk,et al.  Understanding noncoding RNAs , 2014, Nature Methods.

[29]  Roman A. Zubarev,et al.  The SysteMHC Atlas project , 2017, Nucleic Acids Res..

[30]  Arnaud Droit,et al.  Extensive and accurate benchmarking of DIA acquisition methods and software tools using a complex proteomic standard , 2020, bioRxiv.

[31]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[32]  Birgit Schilling,et al.  Clinical applications of quantitative proteomics using targeted and untargeted data-independent acquisition techniques , 2017, Expert review of proteomics.

[33]  Rosemary L. Balleine,et al.  Strategies to enable large-scale proteomics for reproducible research , 2020, Nature Communications.

[34]  Shubham Gupta,et al.  Automated Workflow For Peptide-level Quantitation From DIA/ SWATH-MS Data , 2020, bioRxiv.

[35]  Michael J MacCoss,et al.  Statistical control of peptide and protein error rates in large-scale targeted DIA analyses , 2017, Nature Methods.

[36]  Hannes Röst,et al.  DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics* , 2019, Molecular & Cellular Proteomics.

[37]  Alexey I Nesvizhskii,et al.  Untargeted, spectral library‐free analysis of data‐independent acquisition proteomics data generated using Orbitrap mass spectrometers , 2016, Proteomics.

[38]  Brendan MacLean,et al.  Building high-quality assay libraries for targeted analysis of SWATH MS data , 2015, Nature Protocols.