Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet

Motivation: Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the true signal could also be removed; (ii) baseline removal step may get rid of true peaks and create new false peaks; (iii) in peak quantification step, a threshold of signal-to-noise ratio (SNR) is usually used to remove false peaks; however, noise estimations in SNR calculation are often inaccurate in either time or wavelet domain. In this article, we propose new algorithms to solve these problems. First, we use bivariate shrinkage estimator in stationary wavelet domain to avoid removing true peaks in denoising step. Second, without baseline removal, zero-crossing lines in multi-scale of derivative Gaussian wavelets are investigated with mixture of Gaussian to estimate discriminative parameters of peaks. Third, in quantification step, the frequency, SD, height and rank of peaks are used to detect both high and small energy peaks with robustness to noise. Results: We propose a novel Gaussian Derivative Wavelet (GDWavelet) method to more accurately detect true peaks with a lower false discovery rate than existing methods. The proposed GDWavelet method has been performed on the real Surface-Enhanced Laser Desorption/Ionization Time-Of-Flight (SELDI-TOF) spectrum with known polypeptide positions and on two synthetic data with Gaussian and real noise. All experimental results demonstrate that our method outperforms other commonly used methods. The standard receiver operating characteristic (ROC) curves are used to evaluate the experimental results. Availability: http://ranger.uta.edu/∼heng/MS/GDWavelet.html or http://www.naaan.org/nhanguyen/archive.htm Contact: heng@uta.edu

[1]  Heng Huang,et al.  Array CGH data modeling and smoothing in Stationary Wavelet Packet Transform domain , 2008, BMC Genomics.

[2]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[3]  Peng Zhang,et al.  Peak Tree: A New Tool for Multiscale Hierarchical Representation and Peak Detection of Mass Spectrometry Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Hugh M. Cartwright,et al.  SpecAlign - processing and alignment of mass spectra datasets , 2005, Bioinform..

[5]  Vo V. Anh,et al.  Scaling Theorems for Zero Crossings of Bandlimited Signals , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Marina Vannucci,et al.  Comparison of algorithms for pre-processing of SELDI-TOF mass spectrometry data , 2008, Bioinform..

[7]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..

[8]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[9]  DuPan,et al.  Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching , 2006 .

[10]  Heng Huang,et al.  Peak Detection in Mass Spectrometry by Gabor Filters and Envelope Analysis , 2009, J. Bioinform. Comput. Biol..

[11]  Vincent A Emanuele,et al.  Benchmarking currently available SELDI‐TOF MS preprocessing techniques , 2009, Proteomics.

[12]  Levent Sendur,et al.  Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency , 2002, IEEE Trans. Signal Process..

[13]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[14]  Alan L. Yuille,et al.  Scaling Theorems for Zero Crossings , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Heng Huang,et al.  Stationary Wavelet Packet Transform and Dependent Laplacian Bivariate Shrinkage Estimator for Array-CGH Data Smoothing , 2010, J. Comput. Biol..

[16]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .

[17]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[18]  I. Selesnick Hilbert transform pairs of wavelet bases , 2001, IEEE Signal Processing Letters.

[19]  R. Gentleman,et al.  SELDI-TOF Mass Spectrometry Protein Data , 2005 .