Bayesian nonparametric models for peak identification in maldi-tof mass spectroscopy

We present a novel nonparametric Bayesian approach based on Levy Adaptive Regression Kernels (LARK) to model spectral data arising from MALDI-TOF (Matrix Assisted Laser Desorption Ionization Time-of-Flight) mass spectrometry. This model based approach provides identification and quantification of proteins through model parameters that are directly interpretable as the number of proteins, mass and abundance of proteins and peak resolution, while having the ability to adapt to unknown smoothness as in wavelet based methods. Informative prior distributions on resolution are key to distinguishing true peaks from background noise and resolving broad peaks into individual peaks for multiple protein species. Posterior distributions are obtained using a reversible jump Markov chain Monte Carlo algorithm and provide inference about the number of peaks (proteins), their masses and abundance. We show through simulation studies that the procedure has desirable true-positive and false-discovery rates. Finally, we illustrate the method on five example spectra: a blank spectrum, a spectrum with only the matrix of a low-molecularweight substance used to embed target proteins, a spectrum with known proteins, and a single spectrum and average of ten spectra from an individual lung cancer patient.

[1]  M. Clyde,et al.  Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels , 2011, 1112.3149.

[2]  Bani K. Mallick,et al.  A Bayesian Mixture Model for Protein Biomarker Discovery , 2010 .

[3]  Heng Huang,et al.  Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet , 2010, Bioinform..

[4]  Feng Liang,et al.  Bayesian function estimation using continuous wavelet dictionaries , 2009 .

[5]  Jeffrey S. Morris,et al.  Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed Models , 2008, Biometrics.

[6]  David C Christiani,et al.  Biomarker discovery for arsenic exposure using functional data. Analysis and feature learning of mass spectrometry proteomic data. , 2008, Journal of proteome research.

[7]  Z. Q. John Lu Bayesian Inference for Gene Expression and Proteomics , 2007 .

[8]  Bani K. Mallick,et al.  Bayesian Curve Classification Using Wavelets , 2007 .

[9]  Leanna House,et al.  Bayesian Inference for Gene Expression and Proteomics: Nonparametric Models for Proteomic Peak Identification and Quantification , 2006 .

[10]  Robert L. Wolpert,et al.  Nonparametric Function Estimation Using Overcomplete Dictionaries , 2006 .

[11]  Jeffrey S. Morris,et al.  An Introduction to High-Throughput Bioinformatics Data , 2006 .

[12]  Jeffrey S. Morris,et al.  Bayesian Mixture Models for Gene Expression and Protein Profiles , 2006 .

[13]  Jeffrey S. Morris,et al.  Analysis of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models , 2006 .

[14]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[15]  I. Johnstone,et al.  Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[16]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[17]  Jeffrey S. Morris,et al.  Understanding the characteristics of mass spectrometry data through the use of simulation , 2005, Cancer informatics.

[18]  M. Trosset,et al.  Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques. , 2005, Clinical chemistry.

[19]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[20]  R. Wolpert,et al.  Reflecting uncertainty in inverse problems: a Bayesian solution using Lévy processes , 2004 .

[21]  Anders Björk,et al.  Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. , 2004, Rapid communications in mass spectrometry : RCM.

[22]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[23]  G. Siuzdak The Expanding Role of Mass Spectrometry in Biotechnology , 2006 .

[24]  Y. Yasui,et al.  An Automated Peak Identification/Calibration Procedure for High-Dimensional Protein Measures From Mass Spectrometers , 2003, Journal of biomedicine & biotechnology.

[25]  M. Campa,et al.  Analysis of human serum proteins by liquid phase isoelectric focusing and matrix‐assisted laser desorption/ionization‐mass spectrometry , 2003, Proteomics.

[26]  C. Dass Principles and Practice of Biological Mass Spectrometry , 2000 .

[27]  Nouna Kettaneh,et al.  Statistical Modeling by Wavelets , 1999, Technometrics.

[28]  L. Zhigilei,et al.  Velocity distributions of analyte molecules in matrix-assisted laser desorption from computer simulations , 1998 .

[29]  J. Franzen Improved resolution for MALDI-TOF mass spectrometers: a mathematical study , 1997 .

[30]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[31]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  A. G. Greenhill,et al.  Handbook of Mathematical Functions with Formulas, Graphs, , 1971 .

[34]  David M. Miller,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .