mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

UNLABELLED Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. AVAILABILITY The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[3]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[4]  Ludovic C. Gillet,et al.  Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system , 2013, Nature Methods.

[5]  Michael J MacCoss,et al.  Comparison of Data Acquisition Strategies on Quadrupole Ion Trap Instrumentation for Shotgun Proteomics , 2014, Journal of The American Society for Mass Spectrometry.

[6]  Jarrett D. Egertson,et al.  Multiplexed MS/MS for Improved Data Independent Acquisition , 2013, Nature Methods.

[7]  Anne-Claude Gingras,et al.  Affinity-purification mass spectrometry (AP-MS) of serine/threonine phosphatases. , 2007, Methods.

[8]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[9]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[10]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[11]  P. Bork,et al.  Systematic Discovery of In Vivo Phosphorylation Networks , 2007, Cell.

[12]  Tao Xu,et al.  Bioinformatics Applications Note Sequence Analysis Xdia: Improving on the Label-free Data-independent Analysis , 2022 .

[13]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[14]  Ludovic C. Gillet,et al.  Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps , 2015, Nature Medicine.

[15]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[18]  Samuel I. Miller,et al.  Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. , 2009, Analytical chemistry.

[19]  Navdeep Jaitly,et al.  DAnTE: a statistical tool for quantitative analysis of -omics data , 2008, Bioinform..

[20]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[21]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[22]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[23]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[24]  R. Aebersold,et al.  Analysis of protein complexes using mass spectrometry , 2007, Nature Reviews Molecular Cell Biology.

[25]  Jing Chen,et al.  Glycoproteomic Analysis of Prostate Cancer Tissues by SWATH Mass Spectrometry Discovers N-acylethanolamine Acid Amidase and Protein Tyrosine Kinase 7 as Signatures for Tumor Aggressiveness , 2014, Molecular & Cellular Proteomics.

[26]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..

[27]  Wade H. Dunham,et al.  Affinity‐purification coupled to mass spectrometry: Basic principles and strategies , 2012, Proteomics.

[28]  Pedro Jose Zufiria Zatarain,et al.  Generating Scale-free Networks with Adjustable Clustering Coefficient Via Random Walks , 2011 .

[29]  M. Mann,et al.  More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. , 2011, Journal of proteome research.

[30]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[31]  Nell Sedransk,et al.  Improved Normalization of Systematic Biases Affecting Ion Current Measurements in Label-free Proteomics Data* , 2014, Molecular & Cellular Proteomics.

[32]  Alexander Scherl,et al.  PAcIFIC: how to dive deeper into the proteomics ocean , 2009 .