DIA-NN: Deep neural networks substantially improve the identification performance of Data-independent acquisition (DIA) in proteomics

Data-independent acquisition (DIA-MS) strategies, like SWATH-MS, have been developed to increase consistency, quantification precision and proteomic depth in label-free proteomic experiments. They aim to overcome stochasticity in the selection of precursor ions by utilising (mass-) windowed acquisition that is followed by computational reconstruction of the chromatograms. While DIA methods increasingly outperform typical data-dependent methods in identification consistency and precision specifically on large sample series, possibilities remain for further improvements. At present, only a fraction of the information recorded in the complex DIA spectra is extracted by the software analysis pipelines. Here we present a software tool (DIA-NN) that introduces artificial neural nets and a new quantification strategy to enhance signal processing in DIA-data. DIA-NN greatly improves identification of precursor ions and, as a consequence, protein quantification accuracy. The performance of DIA-NN demonstrates that deep learning provides opportunities to boost the analysis of data-independent acquisition workflows in proteomics.

[1]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[2]  Frédérique Lisacek,et al.  Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC-MS/MS. , 2015, Journal of proteome research.

[3]  Michael J MacCoss,et al.  Specter: linear deconvolution as a new paradigm for targeted analysis of data-independent acquisition mass spectrometry proteomics , 2017, bioRxiv.

[4]  Kate Campbell,et al.  Saccharomyces cerevisiae single-copy plasmids for auxotrophy compensation, multiple marker selection, and for designing metabolically cooperating communities , 2016, F1000Research.

[5]  Roland Bruderer,et al.  Cost-effective generation of precise label-free quantitative proteomes in high-throughput by microLC and data-independent acquisition , 2018, Scientific Reports.

[6]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[7]  Lars Malmström,et al.  DIANA - algorithmic improvements for analysis of data-independent acquisition MS data , 2015, Bioinform..

[8]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[9]  Jian Wang,et al.  MSPLIT-DIA: sensitive peptide identification for data-independent acquisition , 2015, Nature Methods.

[10]  Andrew Keller,et al.  Automated Validation of Results and Removal of Fragment Ion Interferences in Targeted Analysis of Data-independent Acquisition Mass Spectrometry (MS) using SWATHProphet* , 2015, Molecular & Cellular Proteomics.

[11]  Gennifer E. Merrihew,et al.  Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. , 2010, Analytical chemistry.

[12]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[13]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[14]  Yasset Perez-Riverol,et al.  A multi-center study benchmarks software tools for label-free proteome quantification , 2016, Nature Biotechnology.

[15]  William Stafford Noble,et al.  Direct Maximization of Protein Identifications from Tandem Mass Spectra* , 2011, Molecular & Cellular Proteomics.

[16]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[17]  Lech Raczynski,et al.  Neural Network-Based Method for Peptide Identification in Proteomics , 2012, ITIB.

[18]  Ying Zhang,et al.  The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition. , 2015, Journal of proteome research.

[19]  R. Aebersold,et al.  mProphet: automated data processing and statistical validation for large-scale SRM experiments , 2011, Nature Methods.

[20]  Yuanyue Li,et al.  Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files , 2015, Nature Methods.

[21]  Oliver M. Bernhardt,et al.  Optimization of Experimental Parameters in Data-Independent Mass Spectrometry Significantly Increases Depth and Reproducibility of Results* , 2017, Molecular & Cellular Proteomics.

[22]  Oliver M. Bernhardt,et al.  Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisition and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues* , 2015, Molecular & Cellular Proteomics.

[23]  Samuel H Payne,et al.  PECAN: Library Free Peptide Detection for Data-Independent Acquisition Tandem Mass Spectrometry Data , 2017, Nature Methods.

[24]  William Stafford Noble,et al.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. , 2009, Journal of proteome research.

[25]  Lars Malmström,et al.  TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics , 2016, Nature Methods.