BPDA - A Bayesian peptide detection algorithm for mass spectrometry

BackgroundMass spectrometry (MS) is an essential analytical tool in proteomics. Many existing algorithms for peptide detection are based on isotope template matching and usually work at different charge states separately, making them ineffective to detect overlapping peptides and low abundance peptides.ResultsWe present BPDA, a Bayesian approach for peptide detection in data produced by MS instruments with high enough resolution to baseline-resolve isotopic peaks, such as MALDI-TOF and LC-MS. We model the spectra as a mixture of candidate peptide signals, and the model is parameterized by MS physical properties. BPDA is based on a rigorous statistical framework and avoids problems, such as voting and ad-hoc thresholding, generally encountered in algorithms based on template matching. It systematically evaluates all possible combinations of possible peptide candidates to interpret a given spectrum, and iteratively finds the best fitting peptide signal in order to minimize the mean squared error of the inferred spectrum to the observed spectrum. In contrast to previous detection methods, BPDA performs deisotoping and deconvolution of mass spectra simultaneously, which enables better identification of weak peptide signals and produces higher sensitivities and more robust results. Unlike template-matching algorithms, BPDA can handle complex data where features overlap. Our experimental results indicate that BPDA performs well on simulated data and real MS data sets, for various resolutions and signal to noise ratios, and compares very favorably with commonly used commercial and open-source software, such as flexAnalysis, OpenMS, and Decon2LS, according to sensitivity and detection accuracy.ConclusionUnlike previous detection methods, which only employ isotopic distributions and work at each single charge state alone, BPDA takes into account the charge state distribution as well, thus lending information to better identify weak peptide signals and produce more robust results. The proposed approach is based on a rigorous statistical framework, which avoids problems generally encountered in algorithms based on template matching. Our experiments indicate that BPDA performs well on both simulated data and real data, and compares very favorably with commonly used commercial and open-source software. The BPDA software can be downloaded from http://gsp.tamu.edu/Publications/supplementary/sun10a/bpda.

[1]  G. Evensen,et al.  Analysis Scheme in the Ensemble Kalman Filter , 1998 .

[2]  Knut Reinert,et al.  LC-MSsim – a simulation software for liquid chromatography mass spectrometry data , 2008, BMC Bioinformatics.

[3]  Xiaobo Zhou,et al.  Bayesian peak detection for Pro-TOF MS MALDI data , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Charles L. Wilkins,et al.  Developments in MALDI Mass Spectrometry: The Quest for the Perfect Matrix , 2008 .

[5]  BMC Bioinformatics , 2005 .

[6]  Pei Wang,et al.  Bioinformatics Original Paper a Suite of Algorithms for the Comprehensive Analysis of Complex Protein Mixtures Using High-resolution Lc-ms , 2022 .

[7]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ray Bakhtiar,et al.  An introduction to electrospray ionization and matrix-assisted laser desorption/ionization mass spectrometry: Essential tools in a modern biotechnology environment , 1997 .

[9]  Alun Preece,et al.  Universal Metrics for Quality Assessment of Protein Identifications by Mass Spectrometry* , 2006, Molecular & Cellular Proteomics.

[10]  M. Gross,et al.  Fourier transform mass spectrometry. , 1984, Science.

[11]  I. Chernushevich,et al.  An introduction to quadrupole-time-of-flight mass spectrometry. , 2001, Journal of mass spectrometry : JMS.

[12]  J. Yergey A GENERAL APPROACH TO CALCULATING ISOTOPIC DISTRIBUTIONS FOR MASS SPECTROMETRY. , 1983, Journal of mass spectrometry : JMS.

[13]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[14]  Peicheng Du,et al.  Data reduction of isotope-resolved LC-MS spectra , 2007, Bioinform..

[15]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .

[16]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[17]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[18]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[19]  Frank Suits,et al.  A noise model for mass spectrometry based proteomics , 2008, Bioinform..

[20]  Karin Noy,et al.  Improved model-based, platform-independent feature extraction for mass spectrometry , 2007, Bioinform..

[21]  Navdeep Jaitly,et al.  Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data , 2009, BMC Bioinformatics.

[22]  M. Karas,et al.  Laser desorption ionization mass spectrometry of large biomolecules , 1990 .

[23]  Ruedi Aebersold,et al.  A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry*S , 2005, Molecular & Cellular Proteomics.

[24]  F. McLafferty,et al.  High-resolution electrospray mass spectra of large molecules , 1991 .

[25]  J. Schwartz,et al.  Quadrupole ion trap mass spectrometry. , 1996, Methods in enzymology.

[26]  Jeffrey S. Morris,et al.  Understanding the characteristics of mass spectrometry data through the use of simulation , 2005, Cancer informatics.

[27]  Peter B O'Connor,et al.  Use of statistical methods for estimation of total number of charges in a mass spectrometry experiment. , 2004, Analytical chemistry.

[28]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[29]  Xiaobo Zhou,et al.  Reversible jump MCMC approach for peak identification for stroke SELDI mass spectrometry using mixture model , 2008, ISMB.

[30]  Peicheng Du,et al.  Automatic deconvolution of isotope-resolved mass spectra using variable selection and quantized peptide mass distribution. , 2006, Analytical chemistry.

[31]  Yufei Huang,et al.  Review of Peak Detection Algorithms in Liquid-Chromatography-Mass Spectrometry , 2009, Current genomics.

[32]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[33]  F. McLafferty,et al.  Automated reduction and interpretation of , 2000, Journal of the American Society for Mass Spectrometry.

[34]  David G. Stork,et al.  Pattern Classification , 1973 .

[35]  Leo C. McHugh,et al.  Computational Methods for Protein Identification from Mass Spectrometry Data , 2008, PLoS Comput. Biol..

[36]  A. Makarov,et al.  The Orbitrap: a new mass spectrometer. , 2005, Journal of mass spectrometry : JMS.

[37]  Richard D. Smith,et al.  Rapid Calculation of Isotope Distributions , 1995 .

[38]  Navdeep Jaitly,et al.  VIPER: an advanced software package to support high-throughput LC-MS peptide identification , 2007, Bioinform..

[39]  R. March,et al.  Quadrupole ion trap mass spectrometry , 2005 .

[40]  Hermann Wollnik,et al.  Time‐of‐flight mass analyzers , 1993 .