A flexible statistical model for alignment of label-free proteomics data – incorporating ion mobility and product ion information

BackgroundThe goal of many proteomics experiments is to determine the abundance of proteins in biological samples, and the variation thereof in various physiological conditions. High-throughput quantitative proteomics, specifically label-free LC-MS/MS, allows rapid measurement of thousands of proteins, enabling large-scale studies of various biological systems. Prior to analyzing these information-rich datasets, raw data must undergo several computational processing steps. We present a method to address one of the essential steps in proteomics data processing - the matching of peptide measurements across samples.ResultsWe describe a novel method for label-free proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion information. We compare the results of our alignment method to PEPPeR and OpenMS, and compare alignment accuracy achieved by different versions of our method utilizing various data characteristics. Our method results in increased match recall rates and similar or improved mismatch rates compared to PEPPeR and OpenMS feature-based alignment. We also show that the inclusion of drift time and product ion information results in higher recall rates and more confident matches, without increases in error rates.ConclusionsBased on the results presented here, we argue that the incorporation of ion mobility drift time and product ion information are worthy pursuits. Alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods.

[1]  R. Sadygov,et al.  Improved mass defect model for theoretical tryptic peptides. , 2012, Analytical chemistry.

[2]  Pei Wang,et al.  Bioinformatics Original Paper a Suite of Algorithms for the Comprehensive Analysis of Complex Protein Mixtures Using High-resolution Lc-ms , 2022 .

[3]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[4]  Xiang Zhang,et al.  Data pre-processing in liquid chromatography-mass spectrometry-based proteomics , 2005, Bioinform..

[5]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[6]  M. Gorenstein,et al.  Quantitative proteomic analysis by accurate mass retention time pairs. , 2005, Analytical chemistry.

[7]  Hua Tang,et al.  A statistical method for chromatographic alignment of LC-MS data. , 2007, Biostatistics.

[8]  Rovshan G Sadygov,et al.  Examining troughs in the mass distribution of all theoretically possible tryptic peptides. , 2011, Journal of proteome research.

[9]  R. Service Proteomics Ponders Prime Time , 2008, Science.

[10]  J. Listgarten,et al.  Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-Tandem Mass Spectrometry , 2005, Molecular & Cellular Proteomics.

[11]  Jacob D. Jaffe,et al.  PEPPeR, a Platform for Experimental Proteomic Pattern Recognition*S , 2006, Molecular & Cellular Proteomics.

[12]  E. Marcotte,et al.  Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. , 2006, Analytical chemistry.

[13]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[14]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[15]  Guang-Zhong Yang,et al.  Image analysis tools and emerging algorithms for expression proteomics , 2010, Proteomics.

[16]  Knut Reinert,et al.  A geometric approach for the alignment of liquid chromatography - mass spectrometry data , 2007, ISMB/ECCB.

[17]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[18]  Henry H. N. Lam Building and Searching Tandem Mass Spectral Libraries for Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[19]  Benno Schwikowski,et al.  Signal Maps for Mass Spectrometry-based Comparative Proteomics* , 2006, Molecular & Cellular Proteomics.

[20]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[21]  M. Wilkins,et al.  Cross-species protein identification using amino acid composition, peptide mass fingerprinting, isoelectric point and molecular mass: a theoretical evaluation. , 1997, Journal of theoretical biology.

[22]  Ruedi Aebersold,et al.  A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry*S , 2005, Molecular & Cellular Proteomics.

[23]  Jeanette J McCarthy,et al.  High predictive accuracy of an unbiased proteomic profile for sustained virologic response in chronic hepatitis C patients , 2011, Hepatology.

[24]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[25]  Michael A. West Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models , 1992 .

[26]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[27]  M. Gorenstein,et al.  Simultaneous Qualitative and Quantitative Analysis of theEscherichia coli Proteome , 2006, Molecular & Cellular Proteomics.

[28]  Alexandra Valsamakis,et al.  The combination of ribavirin and peginterferon is superior to peginterferon and placebo for children and adolescents with chronic hepatitis C. , 2011, Gastroenterology.

[29]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[30]  Tianwei Yu,et al.  apLCMS - adaptive processing of high-resolution LC/MS data , 2009, Bioinform..

[31]  Jeffrey T. Chang,et al.  GATHER: a systems approach to interpreting genomic signatures , 2006, Bioinform..

[32]  M. Gorenstein,et al.  The detection, correlation, and comparison of peptide precursor and product ions from data independent LC‐MS with data dependant LC‐MS/MS , 2009, Proteomics.

[33]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[34]  Joachim M. Buhmann,et al.  Semi-supervised LC/MS alignment for differential proteomics , 2006, ISMB.

[35]  Dan Golick,et al.  Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures , 2009, Proteomics.

[36]  Johannes P. C. Vissers,et al.  Analysis and Quantification of Diagnostic Serum Markers and Protein Signatures for Gaucher Disease*S , 2007, Molecular & Cellular Proteomics.

[37]  Neal O. Jeffries,et al.  Algorithms for alignment of mass spectrometry proteomic data , 2005, Bioinform..

[38]  Yufei Huang,et al.  Review of Peak Detection Algorithms in Liquid-Chromatography-Mass Spectrometry , 2009, Current genomics.

[39]  Matej Oresic,et al.  MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data , 2006, Bioinform..

[40]  Zhongqi Zhang,et al.  Retention Time Alignment of LC/MS Data by a Divide-and-Conquer Algorithm , 2012, Journal of The American Society for Mass Spectrometry.

[41]  H. Ressom,et al.  A new method for alignment of LC-MALDI-TOF data , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[42]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.