A hybrid retention time alignment algorithm for SWATH‐MS data

Recently, data‐independent acquisition (DIA) MS has gained popularity as a qualitative–quantitative workflow for proteomics. One outstanding problem in the analysis of DIA‐MS data is alignment of chromatographic retention times across multiple samples, which facilitates peptide identification and accurate quantification. Here, we present a novel hybrid (profile‐based and feature‐based) algorithm for LC‐MS alignment and test it on sequential windowed acquisition of all theoretical fragment ion mass spectra (SWATH) (a type of DIA) data. Our algorithm uses a profile‐based dynamic time warping algorithm to obtain a coarse alignment and corrects large retention time shifts, and then uses a feature‐based bipartite matching algorithm to match feature to feature at a fine scale. We evaluated our method by comparing our aligned feature pairs to peptide identification results of pseudo‐MS2 spectra exported by DIA‐Umpire, a recently reported tool for deconvoluting DIA‐MS data. We proposed that our method can be used to align DIA‐MS data prior to identification, and the alignment can be used to delete noise peaks or screen for differentially changed features. We found that a simple alignment‐enabled denoising scheme can reduce the number of pseudo‐MS2 spectra exported by DIA‐Umpire by up to around 40%, while retaining a comparable number of identifications. Finally, we demonstrated the utility of our tool for accurate label‐free relative quantification across multiple SWATH runs.

[1]  Natalie I. Tasman,et al.  iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates* , 2011, Molecular & Cellular Proteomics.

[2]  Fernando M. Maroto,et al.  ChromAlign: A two-step algorithmic procedure for time alignment of three-dimensional LC-MS chromatographic surfaces. , 2006, Analytical chemistry.

[3]  Ruedi Aebersold,et al.  A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry*S , 2005, Molecular & Cellular Proteomics.

[4]  Aiqin Fang,et al.  DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics. , 2010, Analytical chemistry.

[5]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[6]  Samuel H Payne,et al.  Peptide-Centric Proteome Analysis: An Alternative Strategy for the Analysis of Tandem Mass Spectrometry Data* , 2015, Molecular & Cellular Proteomics.

[7]  Frank Suits,et al.  Two-dimensional method for time aligning liquid chromatography-mass spectrometry data. , 2008, Analytical chemistry.

[8]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[9]  Tero Aittokallio,et al.  PolyAlign: A Versatile LC-MS Data Alignment Tool for Landmark-Selected and -Automated Use , 2011, International journal of proteomics.

[10]  A. Smilde,et al.  Dynamic time warping of spectroscopic BATCH data , 2003 .

[11]  Xiang Zhang,et al.  Data pre-processing in liquid chromatography-mass spectrometry-based proteomics , 2005, Bioinform..

[12]  John Chilton,et al.  Using iRT, a normalized retention time for more targeted measurement of peptides , 2012, Proteomics.

[13]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[14]  Jens Stoye,et al.  Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets , 2012, BMC Bioinformatics.

[15]  M. Orešič,et al.  Data processing for mass spectrometry-based metabolomics. , 2007, Journal of chromatography. A.

[16]  M. MacCoss,et al.  Label-free comparative analysis of proteomics mixtures using chromatographic alignment of high-resolution muLC-MS data. , 2008, Analytical chemistry.

[17]  K. Markides,et al.  Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. , 2002, Journal of chromatography. A.

[18]  Mark D. Robinson,et al.  A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments , 2007, BMC Bioinformatics.

[19]  Zhentian Lei,et al.  MET-XAlign: a metabolite cross-alignment tool for LC/MS-based comparative metabolomics. , 2015, Analytical chemistry.

[20]  Lars Malmström,et al.  Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry* , 2015, Molecular & Cellular Proteomics.

[21]  Knut Reinert,et al.  Tools for Label-free Peptide Quantification , 2012, Molecular & Cellular Proteomics.

[22]  P. A. Taylor,et al.  Synchronization of batch trajectories using dynamic time warping , 1998 .

[23]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[24]  Age K Smilde,et al.  Time alignment algorithms based on selected mass traces for complex LC-MS data. , 2010, Journal of proteome research.

[25]  Benno Schwikowski,et al.  Alignment of LC‐MS images, with applications to biomarker discovery and protein identification , 2008, Proteomics.

[26]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[27]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[28]  Oliver M. Bernhardt,et al.  Reproducible and Consistent Quantification of the Saccharomyces cerevisiae Proteome by SWATH-mass spectrometry* , 2015, Molecular & Cellular Proteomics.

[29]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[30]  Yuanyue Li,et al.  Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files , 2015, Nature Methods.

[31]  Age K. Smilde,et al.  Optimized time alignment algorithm for LC-MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms. , 2008, Analytical chemistry.

[32]  Knut Reinert,et al.  A geometric approach for the alignment of liquid chromatography - mass spectrometry data , 2007, ISMB/ECCB.

[33]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[34]  Rainer Breitling,et al.  msCompare: A Framework for Quantitative Analysis of Label-free LC-MS Data for Comparative Candidate Biomarker Studies* , 2012, Molecular & Cellular Proteomics.

[35]  Claus A. Andersson,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[36]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[37]  Jian Wang,et al.  MSPLIT-DIA: sensitive peptide identification for data-independent acquisition , 2015, Nature Methods.

[38]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[39]  Yi-Zeng Liang,et al.  Peak alignment using wavelet pattern matching and differential evolution. , 2011, Talanta.

[40]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[41]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[42]  L Pattini,et al.  MassUntangler: a novel alignment tool for label-free liquid chromatography-mass spectrometry proteomic data. , 2011, Journal of chromatography. A.

[43]  Dan Ventura,et al.  LC-MS alignment in theory and practice: a comprehensive algorithmic review , 2013, Briefings Bioinform..

[44]  Zhongqi Zhang,et al.  Retention Time Alignment of LC/MS Data by a Divide-and-Conquer Algorithm , 2012, Journal of The American Society for Mass Spectrometry.

[46]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[47]  T. Shaler,et al.  Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. , 2003, Analytical chemistry.

[48]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[49]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[50]  K. Standing,et al.  Predicting retention time shifts associated with variation of the gradient slope in peptide RP-HPLC. , 2010, Analytical chemistry.

[51]  Jijie Wang,et al.  Graph-based peak alignment algorithms for multiple liquid chromatography-mass spectrometry datasets , 2013, Bioinform..

[52]  M Daszykowski,et al.  A comparison of three algorithms for chromatograms alignment. , 2006, Journal of chromatography. A.