Improving mass spectrometry peak detection using multiple peak alignment results.

Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/ approximately eeyu/mspeak.htm.

[1]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[2]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[3]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[4]  Jeffrey S. Morris,et al.  A comprehensive approach to the analysis of matrix‐assisted laser desorption/ionization‐time of flight proteomics spectra from serum samples , 2003, Proteomics.

[5]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[6]  Jeffrey S. Morris,et al.  Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. , 2003, Clinical chemistry.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[8]  Somnath Datta,et al.  Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens , 2004, Bioinform..

[9]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[10]  Hongyu Zhao,et al.  Multiple Peak Alignment in Sequential Data Analysis: A Scale-Space-Based Approach , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Hongyu Zhao,et al.  Detecting and aligning peaks in mass spectrometry data with applications to MALDI , 2006, Comput. Biol. Chem..

[12]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[13]  A. Pothen,et al.  Protocols for disease classification from mass spectrometry data , 2003, Proteomics.

[14]  O John Semmes,et al.  Serum Protein Profiles to Identify Head and Neck Cancer , 2004, Clinical Cancer Research.

[15]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[16]  Weichuan Yu,et al.  MALDI-MS data analysis for disease biomarker discovery. , 2006, Methods in molecular biology.

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .