Algorithms for alignment of mass spectrometry proteomic data

MOTIVATION The analysis of biological samples with high-throughput mass spectrometers has increased greatly in recent years. As larger datasets are processed, it is important that the spectra are aligned to ensure that the same protein intensities are correctly identified in each sample. Without such an alignment procedure it is possible to make errors in identifying the signals from peptides with similar molecular weight. Two algorithms are provided that can improve the alignment among samples. One algorithm is designed to work with SELDI data produced from a Ciphergen instrument, and the other can be used with data in a more general format. RESULTS The two algorithms were applied to samples drawn from a common pool of reference serum. The results indicate substantial improvement in consistently identifying peptide signals in different samples.

[1]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[2]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[3]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[4]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[5]  Jeffrey S. Morris,et al.  A comprehensive approach to the analysis of matrix‐assisted laser desorption/ionization‐time of flight proteomics spectra from serum samples , 2003, Proteomics.

[6]  Terence P. Speed,et al.  NORMALIZATION , BASELINE CORRECTION AND ALIGNMENT OF HIGH-THROUGHPUT MASS SPECTROMETRY DATA , 2004 .

[7]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[8]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[9]  Emanuel F Petricoin,et al.  Mass spectrometry-based diagnostics: the upcoming revolution in disease detection. , 2003, Clinical chemistry.

[10]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[11]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[12]  D. Cox Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[13]  Jeffrey S. Morris,et al.  Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. , 2003, Clinical chemistry.

[14]  D. Chan,et al.  Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. , 2005, Clinical chemistry.

[15]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.