Bayesian Inference for Gene Expression and Proteomics: Nonparametric Models for Proteomic Peak Identification and Quantification

Abstract We present model-based inference for proteomic peak identification and quantification from mass spectroscopy data, focusing on nonparametric Bayesian models. Using experimental data generated from MALDI-TOF mass spectroscopy (matrix-assisted laser desorption ionization time-of-flight) we model observed intensities in spectra with a hierarchical nonparametric model for expected intensity as a function of time-of-flight. We express the unknown intensity function as a sum of kernel functions, a natural choice of basis functions for modeling spectral peaks. We discuss how to place prior distributions on the unknown functions using Levy random fields and describe posterior inference via a reversible jump Markov chain Monte Carlo algorithm. Introduction The advent of matrix-assisted laser desorption/ionization such time-of-flight (MALDI-TOF) mass spectroscopy and related SELDI-TOF (surface enhanced laser desorption/ionization) allows the simultaneous assay of thousands of proteins, and has transformed research in protein regulation underlying complex physiological processes. This technology provides the means to detect large proteins in a range of biological samples, from serum and urine to complex tissues, such as tumors and muscle. With appropriate statistical analysis, one may explore patterns of protein expression on a large scale in high-throughput studies without the need for prior knowledge of which proteins may be present (Baldwin et al., 2001; Diamandis, 2003; Martin and Nelson, 2001; Petricoin and Liotta, 2003; Petricoin et al., 2002). As such, it becomes a discovery tool, identifying proteins and pathways that are linked to a biological process. In applications, tens to thousands of spectra may be collected, leading to massive volumes of data. Each spectrum contains on the order of tens of thousands of intensity measurements, with an unknown number of peaks representing proteins of specific mass-to-charge ratios.

[1]  M. Trosset,et al.  Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques. , 2005, Clinical chemistry.

[2]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[3]  G. Siuzdak The Expanding Role of Mass Spectrometry in Biotechnology , 2006 .

[4]  Jeffrey S. Morris,et al.  Understanding the characteristics of mass spectrometry data through the use of simulation , 2005, Cancer informatics.

[5]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[6]  Emanuel F Petricoin,et al.  Mass spectrometry-based diagnostics: the upcoming revolution in disease detection. , 2003, Clinical chemistry.

[7]  Ken-iti Sato Lévy Processes and Infinitely Divisible Distributions , 1999 .

[8]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[9]  A. Shiryaev,et al.  Limit Theorems for Stochastic Processes , 1987 .

[10]  Robert L. Wolpert,et al.  Simulation of Lévy Random Fields , 1998 .

[11]  R. Koenker,et al.  Computing regression quantiles , 1987 .

[12]  R. Schilling Financial Modelling with Jump Processes , 2005 .

[13]  S. Godsill,et al.  Bayesian variable selection and regularization for time–frequency surface estimation , 2004 .

[14]  Murad S. Taqqu,et al.  Fractional Ornstein-Uhlenbeck Lévy processes and the Telecom process: Upstairs and downstairs , 2005, Signal Process..

[15]  M. Clyde,et al.  Lévy Adaptive Regression Kernels , 2007 .

[16]  E. Diamandis Point: Proteomic patterns in biological fluids: do they represent the future of cancer diagnostics? , 2003, Clinical chemistry.

[17]  C. Dass Principles and Practice of Biological Mass Spectrometry , 2000 .

[18]  P. Nelson,et al.  From genomics to proteomics: techniques and applications in cancer research. , 2001, Trends in cell biology.

[19]  M. Clyde,et al.  Flexible empirical Bayes estimation for wavelets , 2000 .

[20]  R. Wolpert,et al.  Poisson/gamma random field models for spatial statistics , 1998 .

[21]  A L Burlingame,et al.  Matrix-assisted laser desorption/ionization coupled with quadrupole/orthogonal acceleration time-of-flight mass spectrometry for protein discovery, identification, and structural analysis. , 2001, Analytical chemistry.

[22]  Anders Björk,et al.  Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. , 2004, Rapid communications in mass spectrometry : RCM.

[23]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[24]  M. Clyde,et al.  Multiple shrinkage and subset selection in wavelets , 1998 .